AFRL-IF -RS-TR-2005-38 
Final  Technical  Report 
February  2005 


ON  METHODS  FOR  HIGHER  ORDER 
INFORMATION  FUSION 


IONA  College 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED. 


AIR  FORCE  RESEARCH  LABORATORY 
INFORMATION  DIRECTORATE 
ROME  RESEARCH  SITE 
ROME,  NEW  YORK 


STINFO  FINAL  REPORT 


This  report  has  been  reviewed  by  the  Air  Force  Research  Laboratory, 
Information  Directorate,  Public  Affairs  Office  (IFOIPA)  and  is  releasable  to  the 
National  Technical  Information  Service  (NTIS).  At  NTIS  it  will  be  releasable  to  the 
general  public,  including  foreign  nations. 


AFRL-IF-RS-TR-2005-38  has  been  reviewed  and  is  approved  for  publication 


APPROVED:  /s/ 

JOHN  J.  SALERNO,  JR. 
Project  Engineer 


FOR  THE  DIRECTOR:  /s/ 

JOSEPH  CAMERA,  Chief 

Information  &  Intelligence  Exploitation  Division 

Information  Directorate 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMB  No.  074-0188 


Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  this  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information,  including 
suggestions  for  reducing  this  burden  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington,  VA  22202-4302, 
and  to  the  Office  of  Management  and  Budget,  Paperwork  Reduction  Project  (0704-0188),  Washington,  DC  20503 _ 


1.  AGENCY  USE  ONLY  (Leave  blank)  2.  REPORT  DATE  3.  REPORT  TYPE  AND  DATES  COVERED 

FEBRUARY  2005  Final  Oct  01  -  Sep  04 


4.  TITLE  AND  SUBTITLE  5.  FUNDING  NUMBERS 

ON  METHODS  FOR  HIGHER  ORDER  INFORMATION  FUSION  C  -  F30602-01 -1-0593 


6.  AUTHOR(S) 

Ronald  R.  Yager 


PE  -627 
PR  -  FFf 
TA  -  II 
WU  -  08 


-  62702F 

-  FF8B 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Machine  Intelligence  Institute 
IONA  College 
715  North  Avenue 
New  Rochelle  New  York  10801 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


9.  SPONSORING  /  MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

Air  Force  Research  Laboratory/IFEA 

525  Brooks  Road 

Rome  New  York  13441-4505 


10.  SPONSORING  /  MONITORING 
AGENCY  REPORT  NUMBER 

AFRL-IF-RS-TR-2005-38 


11.  SUPPLEMENTARY  NOTES 


AFRL  Project  Engineer:  John  J.  Salerno,  Jr./I FEA/(3 15)  330-3667/  John.Salerno@rl.af.mil 


12a.  DISTRIBUTION  /  AVAILABILITY  STATEMENT 

APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED. 


12b.  DISTRIBUTION  CODE 


13.  ABSTRACT  (Maximum  200  Words) 

The  primary  focus  of  this  effort  was  on  the  development  of  theory  useful  for  the  intelligent  fusion  of  information  and  its 
subsequent  use  in  situation  awareness  and  assessment.  This  work  was  focused  on  the  representation  of  complex  and 
imprecise  relationships  between  objects  as  well  as  facility  for  representing  and  manipulating  various  kinds  of 
uncertainties. 


14.  SUBJECT  TERMS 

Indicator  and  Warnings,  Situation  Awareness,  Fuzzy  Sets,  Fusion 


15.  NUMBER  OF  PAGES 

_ 103 

16.  PRICE  CODE 


17.  SECURITY  CLASSIFICATION  18.  SECURITY  CLASSIFICATION  19.  SECURITY  CLASSIFICATION  20.  LIMITATION  OF  ABSTRACT 
OF  REPORT  OF  THIS  PAGE  OF  ABSTRACT 


UNCLASSIFIED 


NSN  7540-01-280-5500 


UNCLASSIFIED 


UNCLASSIFIED 


Standard  Form  298  (Rev.  2-89) 

Prescribed  by  ANSI  Std.  Z39-18 
298-102 


TABLE  OF  CONTENTS 


SUMMARY . 1 

PART  I:  A  FRAMEWORK  FOR  MULTI-SOURCE  DATA  FUSION . 3 

1.  INTRODUCTION . 3 

2.  SOME  CONSIDERATIONS  IN  DATA  FUSION . 3 

3.  CHARACTERIZING  USER  REQUIREMENTS . 9 

4.  BASIC  CONCEPTUAL  FRAMEWORK  FOR  DATA  FUSION . 11 

5.  SOME  COMMON  COMPATIBILITY  RELATIONSHIPS . 13 

6.  SOME  FEATURES  ON  THE  FUNCTION  F . 16 

7.  ON  MONOTONICITY  IN  ORDINAL  SPACES . 17 

8.  CREDIBILITY  WEIGHTED  SOURCES . 19 

9.  ON  THE  INCLUSION  OF  REASONABLENESS . 21 

10.  GRANULAR  OBJECTS  AS  FUSED  VALUE . 24 

11.  MULTIPLE  GRANULAR  OBJECTS  AS  FUSED  VALUES . 29 

12.  REFERENCES  PART  1 . 30 

PART  II.  FUZZY  METHODS  IN  WEB  QUESTION  ANSWERING  SYSTEMS . 32 

1.  INTRODUCTION . 32 

2.  BASIC  CONCEPTS  OF  APPROXIMATE  REASONING . 34 

3.  SEMANTICS  OF  AR  THE  KNOWLEDGE  REPRESENTATION . 39 

4.  TRANSLATION  OF  INFORMATION . 42 

5.  INFERENCE  AND  KNOWLEDGE  MANIPULATION . 44 

6.  PROTOFORMS . 47 

7.  KNOWLEDGE  TREES . 50 

i 


8.  NON-MONOTONIC  POSSIBILISTIC  PROPOSITIONS 


,53 


9.  DEFUZZIFICATION . 60 

10.  DEMPSTER-SHAFER  GRANULES  FOR  KNOWLEDGE  REPRESENTATION . 62 

11.  REFERENCES  PART  II . 65 

PART  III.  UNCERTAINTY  MANAGEMENT  FOR  INTELLIGENCE  ANALYSIS . 68 

1.  INTRODUCTION . 68 

2.  VARIABLES  AND  QUESTION  ANSWERING . 68 

3.  BASIC  KNOWLEDGE  REPRESENTATION  USING  FUZZY  SETS . 70 

4.  ON  THE  MEASURES  OF  POSSIBILITY  AND  CERTAINTY . 73 

5.  HEDGING  ON  OUR  DATA . 78 

6.  MULTI-SOURCE  INFORMATION  FUSION . 80 

7.  MULTIPLE  FUSED  VALUES  FROM  MULTI-SOURCE  DATA . 84 

8.  FUSING  PROBABILISTIC  AND  POSSIBILISTIC  DATA . 86 

9.  ALTERNATIVE  MEASURES  OF  CERTAINTY . 91 

10.  REFERENCES  PART  III . 96 


List  of  Figures 


Figure  1  Schematic  of  Data  Fusion . 4 

Figure  2.  Illustration  of  Saturation . 18 

Figure  3a.  Effect  of  Saturation  on  Monotonicity . 18 

Figure  3b.  Effect  of  Saturation  on  Monotonicity . 1 8 

Figure  4.  Typical  Knowledge  Tree . 52 

Figure  5.  Subtree  from  Non-monotonic  Propositions . 60 

Figure  6.  Defuzzification  sub  tree . 62 

Figure  7.  Task  of  answering  a  question . 69 

Figure  8.  Different  relations  between  knowledge  and  question . 73 

Figure  9.  Quantifiers  parameterized  by  p . 94 


SUMMARY 

In  part  one  we  look  at  a  framework  for  the  multi-source  data  fusion  process.  Some  of  the 
considerations  and  information  that  must  go  into  the  development  of  a  multi-source  data  fusion 
algorithm  are  described.  Features  that  play  a  role  in  expressing  users’  requirements  are  also 
described.  We  provide  a  framework  for  data  fusion  based  on  a  voting  like  process  that  tries  to 
adjudicate  conflict  among  the  data.  We  discuss  the  idea  of  a  compatibility  relationship  and 
introduce  several  important  examples  of  these  relationships.  We  show  that  our  formulation 
results  in  some  bounding  conditions  on  the  fused  value  implying  that  the  fusion  process  has  the 
nature  of  a  mean  type  aggregation.  Situations  in  which  the  sources  have  different  credibility 
weights  are  considered.  We  present  a  concept  of  reasonableness  as  a  means  for  including  in  the 
fusion  process  any  information  available  other  then  that  provided  by  the  sources.  We  consider  the 
situation  when  we  allow  our  fused  values  to  be  granular  objects  such  as  linguistic  terms  or 
subsets. 

In  part  two  we  consider  web  based  question-answering  systems.  We  point  out  that 
question-answering  systems  differ  from  other  information  seeking  applications,  such  as  search 
engines,  by  having  a  deduction  capability  or  an  ability  to  answer  questions  by  a  synthesis  of 
information  residing  in  different  parts  of  its  knowledge  base.  This  capability  requires  appropriate 
representation  of  various  types  of  human  knowledge,  rules  for  locally  manipulating  this 
knowledge  and  a  framework  for  providing  a  global  plan  for  appropriately  mobilizing  the 
information  in  the  knowledge  base  to  address  the  question  posed.  We  suggest  some  tools  to 
provide  these  capabilities.  We  describe  how  the  fuzzy  set  based  theory  of  approximate  reasoning 
can  aid  in  the  process  of  representing  knowledge.  We  discuss  how  protoforms  can  be  used  to  aid 
in  deduction  and  the  local  manipulation  of  knowledge.  The  idea  of  a  knowledge  tree  is  introduced 
to  provide  a  global  framework  for  mobilizing  the  knowledge  base  in  response  to  a  query.  We 
look  at  some  types  of  commonsense  and  default  knowledge.  This  requires  us  to  address  the 
complexity  of  the  non-monotonicity  that  these  types  of  knowledge  often  display.  We  also  briefly 
discuss  the  role  that  Dempster-Shafer  structures  can  play  in  representing  knowledge. 

Considerable  concern  has  arisen  regarding  the  quality  of  intelligence  analysis.  This  has 
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been  in  large  part  motivated  by  the  task,  prior  to  the  Iraq  war,  of  determining  whether  Iraq  had 
weapons  mass  destruction.  One  problem  that  made  this  analysis  difficult  was  the  uncertainty  in 
much  of  the  infonnation  available  to  the  intelligence  analysts.  In  part  three  we  introduce  some 
tools  that  can  be  of  use  to  intelligence  analysts  for  representing  and  processing  uncertain 
information.  We  make  considerable  use  of  technologies  based  on  fuzzy  sets  and  related 
disciplines  such  as  approximate  reasoning. 


2 


PART  I:  A  Framework  for  Multi-Source  Data  Fusion 

1.  Introduction 

With  the  rapid  growth  of  the  Internet  and  other  electronic  sources  of  information  capture, 
the  problem  of  the  coherent  merging  of  infonnation  from  multiple  sources  has  become  an 
important  issue.  This  problem  has  many  manifestation  ranging  from  data  mining  to  information 
retrieval  to  multi-sensor  fusion  [1-4].  Here  we  consider  one  type  of  problem  from  this  class.  We 
shall  consider  the  situation  in  which  we  have  some  attribute  variable,  whose  value  we  are 
interested  in  supplying  to  a  user.  In  our  situation  we  have  multiple  sources  providing  data  values 
for  this  variable  as  input  to  our  system.  We  call  this  a  data  fusion  problem.  Even  this  limited 
example  of  multi-source  information  fusion  is  a  very  expansive  problem.  Here  we  shall  restrict 
ourselves  to  the  discussion  of  some  issues  related  to  this  problem. 

2.  Some  Considerations  in  Data  Fusion 

The  process  of  data  fusion  is  initiated  by  a  request  for  information  about  an  attribute 
variable  to  our  sources  of  infonnation.  Let  V  be  a  attribute  variable  whose  value  lies  in  the  set  X, 
called  the  inverse  of  V.  We  assume  a  collection  Sj,  S2,  •••,  Sq  of  infonnation  sources.  Each 

source  provides  a  value  which  we  call  our  data.  The  problem  here  becomes  the  fusion  of  these 
pieces  of  data  to  obtain  a  fused  value  appropriate  for  the  user's  requirements.  The  approaches  and 
methodologies  available  for  solving  this  problem  depend  upon  various  considerations  some  of 
which  we  shall  outline  in  the  following.  In  figure  1  we  provide  a  schematic  framework  of  this 
multi-source  data  fusion  problem  which  we  use  as  a  basis  for  our  discussion. 

Our  fusion  engine  combines  the  data  provided  by  the  information  sources  using  various 
types  of  knowledge  it  has  available  to  it.  We  emphasize  that  the  fusion  process  involves  use  of 
both  the  data  provided  by  the  sources  as  well  as  other  knowledge.  This  other  knowledge  includes 
both  context  knowledge  and  user  requirements. 
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Figure  1  Schematic  of  Data  Fusion 


Let  us  begin  to  look  at  some  of  the  considerations  that  effect  the  mechanism  that  is  used 
by  the  fusion  engine.  One  important  consideration  in  the  implementation  of  the  fusion  process  is 
related  to  the  form,  with  respect  to  its  certainty,  in  which  the  source  provides  its  information. 
Consider  the  problem  of  trying  to  determine  the  age  of  John.  The  most  certain  situation  is  when  a 
source  reports  a  value  that  is  a  member  of  X,  John's  age  is  23.  Alternatively  the  reported  value 
can  include  some  uncertainty.  It  could  be  a  linguistic  value  such  as  John  is  "young."  It  could 
involve  a  probabilistic  expression  of  the  knowledge.  Other  forms  of  uncertainty  can  be  associated 
with  the  infonnation  provided.  We  note  the  fuzzy  measure  [5,  6]  and  the  Dempster-Shafer  belief 
functions  [7,  8]  provide  two  general  frameworks  for  representing  uncertainty  information.  In  the 
following  unless  otherwise  specified,  we  shall  assume  the  information  provided  by  a  source  is  a 
specific  value  in  the  space  X. 

Another  consideration  is  the  inclusion  of  source  credibility  information  in  the  process. 
Source  credibility  is  a  user  generated  or  sanctioned  knowledge  base.  It  associates  with  the  data 
provided  by  the  source  a  weight  indicating  its  credibility.  This  credibility  weight  must  be  drawn 
from  a  scale  that  can  be  either  ordinal  or  numeric.  That  is,  there  must  be  some  ordering  of  the 
credibility  values.  The  process  of  assigning  the  credibility  to  the  data  reported  by  a  source  can 
involve  various  degrees  of  sophistication.  For  example,  degrees  of  credibility  can  be  assigned 
globally  to  each  of  the  sources.  Alternatively,  source  credibility  can  be  dependent  upon  the  type 
of  variables  involved.  For  example,  one  source  may  be  very  reliable  with  information  about  ages 
while  not  very  good  about  information  about  a  person's  income.  Even  more  sophisticated 
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distinctions  can  be  made,  for  example,  a  source  could  be  good  about  infonnation  on  high  income 
people  but  bad  about  low  income  people. 

As  we  noted  the  information  about  credibility  must  be  at  least  ordered.  It  may  or  may  not 
be  expressed  using  a  well  defined  bounded  scale.  For  example,  the  credibility  information  may  be 
expressed  by  ordering  of  the  sources  themselves.  Alternatively  the  credibility  of  a  source  may  be 
expressed  in  terms  of  the  assignment  of  a  value  drawn  from  a  well  defined  scale  having  an  upper 
and  lower  element.  Generally  in  the  case  when  the  credibility  is  selected  from  such  a  scale  the 
assignment  of  the  upper  value  to  a  source  indicates  that  the  data  should  be  given  full  weight.  An 
assignment  of  the  lowest  value  generally  means  don’t  use  the  value,  this  implies  the  value  should 
have  no  or  little  influence  in  the  fusion  process.  In  some  ways  the  association  of  a  less  then 
complete  credibility  with  a  source  is  closely  related  to  a  kind  of  uncertainty  assigned  to  the  value 
provided  by  the  source. 

There  exists  an  interesting  special  situation,  with  respect  to  credibility;  where  some 
sources  may  be  considered  as  disinformative  or  misleading.  Here  the  lowest  value  on  the 
credibility  scale  can  be  used  to  correspond  to  some  idea  of  taking  the  "opposite"  of  the  value 
provided  by  the  source  rather  than  assuming  the  data  provided  is  of  no  value.  This  is  somewhat 
akin  to  the  relationship  between  false  and  complementation  in  logic. 

Central  to  the  problem  of  data  fusion  is  the  issue  of  conflict  and  its  resolution.  The 
proximity  knowledge  base  and  the  reasonableness  knowledge  base  play  important  roles  in  the 
handling  of  this  issue. 

One  obvious  form  of  conflict  arises  when  we  have  multiple  readings  of  a  variable  that  may 
not  be  the  same  or  even  compatible.  For  example,  one  source  may  say  Osema  Bin  Laden  is  35 
years  old,  another  may  say  he  is  45  and  another  may  say  he  is  55.  We  shall  refer  to  this  as  data 
conflict.  As  we  shall  subsequently  see,  the  proximity  knowledge  base  plays  an  important  role  in 
issues  related  to  the  adjudication  of  this  kind  of  conflict. 

There  exists  another  kind  of  conflict,  one  can  incur  even  when  we  only  have  a  single 
reading  for  a  variable.  This  can  occur  when  a  reported  reading  conflicts  with  what  we  know  to  be 
the  case,  what  is  reasonable.  Assume  when  searching  for  the  age  of  Osema  Bin  Laden  one  of  the 

sources  report  that  he  is  eighty  years  old.  This  conflicts  with  what  we  know  to  be  reasonable. 
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This  is  information  that  we  consider  to  have  a  higher  priority  then  any  information  provided  by 
any  of  the  sources.  In  this  case  our  action  is  clear:  we  discount  this  observation.  We  shall  call 
this  a  context  conflict,  it  relates  to  a  conflict  with  infonnation  available  to  the  fusion  process 
external  to  the  data  provided  by  the  sources.  The  repository  of  this  higher  priority  information  is 
what  we  have  indicated  as  the  knowledge  of  reasonableness  in  figure  1.  This  type  of  a  priori 
context  or  domain  knowledge  can  take  many  forms  and  can  be  represented  in  different  ways. 

As  an  illustration  of  one  method  of  handling  this  type  of  domain  knowledge  is  where  we 
assume  our  knowledge  base  is  in  the  form  of  a  mapping  over  the  domain  of  V.  More  specifically 
a  mapping  R:X  — >  T  called  the  reasonableness  mapping.  We  allow  this  to  capture  the 
information  we  have,  external  to  the  data,  about  the  possibilities  of  the  different  values  in  X  being 
the  actual  value  of  V.  Thus  for  any  x  e  X,  R(x)  indicates  the  degree  of  reasonableness  of  x. 
Typically,  T  is  the  unit  interval  I  =  [0,  1].  In  this  case  R(x)  =  1  indicates  that  x  is  a  completely 
reasonable  value  while  R(x)  =  0  means  x  is  completely  unreasonable.  More  generally  T  can  be  an 
ordered  set  T  =  {t] ,  ....,  tn}  which  has  a  largest  and  smallest  element,  0  and  1.  R  can  be  viewed 

as  a  fuzzy  subset  of  reasonable  values. 

The  reasonableness  mapping  R  provides  for  the  inclusion  of  information  about  the  context 
in  which  we  are  performing  the  fusion  process.  Any  data  provided  by  a  source  should  be 
acceptable  given  our  external  knowledge  about  the  situation.  The  use  of  the  reasonableness  type 
of  relationship  clearly  provides  a  very  useful  vehicle  for  including  intelligence  in  the  process. 

In  the  data  fusion  process,  this  knowledge  of  reasonableness  often  interacts  with  the 
source  credibility  in  an  operation  which  we  shall  call  reasonableness  qualification.  A  typical 
application  of  this  is  described  in  the  following.  Assume  we  have  a  source  that  provides  a  data 
value  aj  and  has  credibility  tj.  Here  we  use  the  mapping  R  to  inject  the  reasonableness,  R(aj), 
associated  with  the  value  a.[  and  then  use  it  to  modify  tj  to  give  us  zj,  the  support  for  data  value  aj 
that  came  from  source  Sj.  The  process  of  obtaining  zj  from  tj  and  R(aj)  is  denoted  zj  =  g(tj, 
R(a{)),  and  is  called  reasonableness  qualification.  In  the  following  we  shall  suppress  the  indices 
and  denote  this  operator  as  z  =  g(t,  r)  where  r  =  R(a).  For  simplicity  we  shall  assume  t  and  r  are 
from  the  same  scale. 

Let  us  indicate  some  of  the  properties  that  should  be  associated  with  this  operation.  A 
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A  .  h  A 

first  property  universally  required  of  this  operation  is  monotonicity,  g(t,  r)  >  f(t,  r)  if  t  >  t  and  r  >  r. 
A  second  property  that  is  required  is  that  if  either  t  or  r  is  zero,  the  lowest  value  on  the  scale,  then 
g(t,  r)  =  0.  Thus  if  we  have  no  confidence  in  the  source  or  the  value  it  provides  is  not  reasonable, 
then  the  support  is  zero.  Another  property  that  can  be  associated  with  this  operation  is  symmetry, 
g(t,  r)  =  g(r,  t).  Although  we  may  or  may  not  necessarily  require  this  of  all  manifestations  of  the 
operation. 

The  essential  semantic  interpretation  of  this  operation  is  one  of  saying  that  in  order  to 
support  a  value  we  desire  it  to  be  reasonable  and  emanating  from  a  source  in  which  we  have 
confidence.  This  essentially  indicates  this  operation  is  an  "anding"  of  the  two  requirements. 
Under  this  situation  a  natural  condition  to  impose  is  g(t,  r)  <  Min[t,  r]. 

1 

Relationships  conveying  information  about  the  congeniality  between  values  in  the 
universe  X  in  the  context  of  their  being  the  value  of  V  play  an  important  role  in  the  development 
of  data  fusion  systems.  Generally,  these  types  of  relationships  convey  information  about  the 
compatibility  and  interchangeability  between  elements  in  X  and  as  such  are  fundamental  to  the 
resolution  and  adjudication  of  internal  conflict.  Without  these  relationships  conflict  can't  be 
resolved.  In  many  applications  underlying  congeniality  relationships  are  implicitly  assumed.  A 
common  example  is  the  use  of  least  squared  based  methods.  The  use  of  linguistic  concepts  and 
other  granulation  techniques  are  based  on  these  relationships  [9,  10].  Clustering  operations 
require  these  relationships.  These  relationships  are  related  to  equivalence  relationships  and 
metrics.  Central  to  these  relationships  are  the  properties  of  reflexivity  (a  value  is  congenial  to  its 
self)  and  symmetry. 

For  our  purposes  we  shall  find  the  concept  of  a  proximity  relationship  [11]  useful  in 
discussing  data  fusion.  Formally,  a  proximity  relationship  on  a  space  X  is  a  mapping 
Prox:X  x  X  — »  T  having  the  properties:  (1).  Prox(x,  x)  =  1  (reflexive)  and  (2).  Prox(y,  x)  = 
Prox(y,  y)  (symmetric).  Here  T  is  ordered  space  having  largest  and  smallest  elements  denoted  1 
and  0.  Often  T  is  the  unit  interval.  Intuitively  the  value  Prox(x,  y)  is  some  measure  of  degree  to 


1 

We  use  this  term  to  indicate  relationships  like  proximity,  similarity,  equivalence  or  distance. 
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which  the  values  x  and  y  are  compatible  and  non-conflicting  with  respect  to  context  in  which  the 
user  is  seeking  the  value  of  V.  The  concept  of  metric  or  distance  is  related  in  an  inverse  way  to 
the  concept  of  proximity. 

A  closely  related  and  stronger  idea  is  the  concept  of  similarity  relationship  as  introduced 
by  Zadeh  [12,  13].  Formally,  a  similarity  relationship  on  a  space  X  is  a  mapping  Sim:X  xX^T 
having  the  properties 

1)  Sim(x,  x)  =  1  reflexive 

2)  Sim(x,  y)  =  Sim(y,  x)  symmetric 

3)  Sim(x,  z)  >  Sim(x,  y)  a  Sim(y,  z)  transitive 

A  similarity  relationship  adds  the  additional  requirement  of  transitivity.  Similarity  relationships 
provide  a  generalization  of  the  concept  of  equivalent  relationships. 

A  fundamental  distinction  between  proximity  and  similarity  relationships  is  the  following. 
In  a  proximity  relationship  x  and  y  can  be  related  and  y  and  z  can  be  related  without  having  x  and 
z  being  related.  In  a  similarity  relationship  under  the  stated  premise  a  relationship  must  also  exist 
between  x  and  z 

In  situations  in  which  V  has  its  value  taken  from  a  numeric  scale  then  the  bases  of  the 
proximity  relationship  is  the  difference  |x  -  y|.  However  the  mapping  of  |x  -  y|  into  Prox(x,  y)  may 
be  highly  non-linear. 

For  attributes  assuming  non-numeric  values  a  relationship  of  proximity  is  based  on 

relevant  features  associated  with  the  elements  in  the  variables  universe.  Here  we  can  envision  a 

variable  having  multiple  appropriate  proximity  relationships.  As  an  example  let  V  be  the  country 

in  which  John  was  born,  its  domain  X  is  the  collection  of  all  the  countries  of  the  world.  Let  us  see 

what  types  of  proximity  relationship  can  be  introduced  on  X  in  this  context.  One  can  consider  the 

continent  in  which  a  country  lies  as  the  basis  of  a  proximity  relationship,  this  would  actually 

generate  an  equivalence  relationship.  More  generally,  the  physical  distance  between  countries 

can  be  the  basis  of  a  proximity  relationship.  The  spelling  of  the  country's  name  can  be  the  basis 

of  a  proximity  relationship.  The  primary  language  spoken  in  a  country  can  be  the  basis  of  a 

proximity  relationship.  We  can  even  envision  notable  topographic  or  geographic  features  as  the 

basis  of  proximity  relationships.  Thus,  many  different  proximity  relationships  may  occur.  The 
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important  point  here  is  that  the  association  of  a  proximity  relationship  over  the  domain  over  a 
variable  can  be  seen  as  a  very  creative  activity.  More  importantly,  the  choice  of  proximity 
relationship  can  play  a  significant  role  in  the  resolution  of  conflicting  information. 

A  primary  consideration  that  effects  the  process  used  by  the  fusion  engine  is  what  we  shall 
call  the  compositional  or  matrimonial  nature  of  the  values  of  the  variable  V.  This  characteristic 
plays  an  important  role  in  the  types  of  operations  that  are  available  to  use  during  the  fusion 
process.  It  determines  what  types  of  aggregations  we  can  perform.  This  concept  is  closely  related 
to  the  idea  of  scale  and  measurement,  a  topic  studied  extensively  in  the  field  of  mathematical 
psychology.  For  our  purposes  we  shall  distinguish  between  three  types  of  variables  with  respect 
to  this  characteristic.  The  first  type  of  variable  is  what  we  shall  call  celibate.  The  word  nominal 
can  be  used  here.  These  are  variables  for  which  the  composition  of  multiple  values  is 
meaningless.  An  example  of  this  type  of  variable  is  a  person's  name.  Here  the  process  of 
combining  this  is  completely  inappropriate.  More  formally  these  are  variables  whose  universe 
has  no  mathematical  structure.  These  variables  only  allow  comparison  as  to  whether  they  are 
equal. 

A  more  structured  type  of  variable  is  an  ordinal  variable.  For  these  types  of  variables 
these  exists  some  kind  of  meaningful  ordering  of  the  members  of  the  universe.  An  example  of 
this  is  a  variable  corresponding  to  size  that  has  as  its  universe  {small,  medium,  large}.  For  these 
variables  some  kind  of  compositional  process  is  allowable,  combining  small  and  large  to  obtain 
medium  is  meaningful.  Here  composition  operations  must  be  based  on  ordering. 

The  most  structured  type  of  variable  is  a  numeric  variable.  For  these  variables  in  addition 
to  ordering  we  have  the  availability  of  all  the  arithmetic  operators.  This  of  course  allows  us  a 
great  degree  of  freedom  and  a  large  body  of  compositional  operators. 


3.  Characterizing  User  Requirements 

The  output  of  any  fusion  process  must  be  guided  by  the  needs,  requirements  and  desires  of 
the  user.  In  the  following  we  shall  describe  some  considerations  and  features  that  can  be  used  to 
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define  or  express  the  requirements  of  the  user. 

An  important  consideration  in  the  presentation  of  output  is  the  users  level  of  conflict 
tolerance.  Conflict  tolerance  is  related  to  the  multiplicity  of  possible  values  presented  to  the  user. 
Does  the  user  desire  one  unique  value  or  is  it  appropriate  to  provide  him  with  a  few  solutions  or  is 
the  presentation  of  all  the  multi  source  data  appropriate? 

Another  closely  related  issue  relates  to  the  level  of  granulation.  As  described  by  Zadeh 
[14]  a  granule  is  a  collection  of  values  drawn  together  by  proximity  of  various  types.  Linguistic 
terms  such  as  cold  and  old  are  granules  corresponding  to  a  collection  of  values  whose  proximity  is 
based  on  the  underlying  variable.  In  providing  information  we  must  satisfy  the  user's  level  of 
granularity  for  the  task  at  hand.  Here  we  are  not  referring  to  the  number  of  solutions  provided  but 
the  nature  of  each  solution  object.  One  situation  is  that  in  which  each  solution  presented  to  the 
user  must  be  any  element  from  the  domain  X.  Another  possibility  is  one  in  which  we  can 
provide,  as  a  single  solution,  a  subset  of  closely  related  values.  Presenting  ranges  of  values  is  an 
example  of  this.  Another  situation  is  where  use  a  vocabulary  of  linguistic  tenns  to  express 
solutions.  An  example  is  where  using  a  term  such  as  "cold"  as  the  value  for  the  temperature 
would  be  acceptable. 

Another  issue  related  to  the  form  of  the  output  is  whether  all  output  values  presented  to  the 
user  are  required  to  be  values  that  appear  in  the  input  or  can  we  blend  output  values  using 
techniques  such  as  averaging  to  construct  new  values  that  didn’t  appear  in  the  input.  A  closely 
related  issue  is  the  reasonableness  of  the  output..  For  example,  consider  the  attempt  to  determine 
the  number  of  children  that  John  has.  Assume  one  source  says  8  and  another  says  7,  taking  the 
average  gives  us  7.5.  Well,  clearly  it  is  impossible  for  our  John  to  have  7.5  children.  In  addition, 
we  should  note  that  sometimes  the  requirement  for  reasonableness  may  be  different  for  the  output 
then  input. 

Another  feature  of  the  output  revolves  around  the  issue  of  qualification.  Does  the  user 

desire  qualifications  associated  with  suggested  values  or  does  he  prefer  no  qualification?  As  we 

discussed  earlier  input  values  to  a  fusion  system  often  have  attached  values  of  credibility,  this 

being  due  to  the  credibility  of  the  source  and  the  reasonableness  of  the  data  provided. 

Considerations  related  to  the  presentation  of  this  credibility  arise  regarding  the  requirements  of 
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the  user.  Are  we  to  present  weights  of  credibility  with  the  output  or  present  it  without  these 
weights  ?  In  many  techniques,  such  as  weighted  averaging,  the  credibility  weight  gets  subsumed 
in  the  fusion  process. 

In  most  cases  the  fusion  process  should  be  deterministic,  a  given  informational  situation 
should  always  result  in  the  same  fused  value.  In  some  cases  we  may  allow  for  a  non- 
deterministic,  random  mechanism  in  the  fusion  process.  For  example  in  situations  in  which  some 
adversary  may  have  some  role  in  effecting  the  infonnation  used  in  the  fusion  process  we  may 
want  to  use  randomization  to  blur  and  confuse  the  influence  of  their  information. 


4.  Basic  Conceptual  Framework  for  Data  Fusion 

Here  we  shall  provide  a  basic  framework  in  which  to  view  and  implement  the  data  fusion 
process.  We  shall  see  that  this  framework  imposes  a  number  of  properties  that  should  be  satisfied 
by  a  rational  data  fusion  technology. 

Consider  a  variable  V  having  an  underlying  universe  X.  Consider  the  situation  in  which 

we  have  a  collection  of  q  assessments  of  the  variable,  inputs  to  our  fusion  engine.  Each 
assessment  is  information  supplied  by  one  of  our  sources.  Let  aj  be  the  value  provided  by  the  fih 

source.  Our  desire  here  is  to  fuse  these  values  to  obtain  some  value  a  e  X  as  the  fused  value.  We 
denote  this  as  a  =  Agg(a| ,  ...,  an).  The  issue  then  becomes  that  of  obtaining  the  operator  Agg  that 

fuses  these  pieces  of  data.  One  obvious  requirement  of  such  an  aggregation  operator  is 
idempotency,  if  all  a[  =  a  then  a  =  a. 

In  order  to  obtain  acceptable  forms  for  Agg  we  must  conceptually  look  at  the  fusion 
process.  At  a  meta  level  multi-source  data  fusion  is  a  process  in  which  the  individual  sources 
must  agree  on  a  solution  that  is  acceptable  to  each  of  them,  that  is  compatible  with  the  data  they 
each  have  provided. 

Let  a  be  a  proposed  solution,  some  element  from  X.  Each  source  can  be  seen  as  "voting" 
whether  to  accept  this  solution.  Let  us  denote  Supj(«)  as  the  support  for  solution  a  from  source  i. 

We  then  need  some  process  of  combining  the  support  for  a  from  each  of  the  sources.  We  let 

Sup(a)  =  L(Supi(«),  Sup2(fl), ....  Supq(o)) 
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be  the  total  support  for  a.  Thus  F  is  some  function  that  combines  the  support  from  each  of  the 
sources.  The  aggregated  value  a  is  then  obtained  as  the  value  a  e  X  that  maximizes  Sup(a).  Thus 
a  is  such  that  Sup(a)  =  Maxaex[Sup(a)]. 

One  natural  property  associated  with  F  is  that  the  more  support  from  the  individual 
sources  the  more  overall  support  for  a.  Formally  if  a  and  b  are  two  values  and  if  Supj(«)  > 

Supi(6)  for  all  i  then  Sup(a)  >  Sup(A).  This  requires  that  F  be  a  monotonic  function,  F(x  | ,  X2, 

. ,  Xq)  >  F(y  i ,  y2,  . ,  yq)  if  x{  >  yj  for  all  i.  A  slightly  stronger  requirement  is  strict 

monotonicity.  This  requires  that  F  be  such  that  if  xj  >  yj  for  all  i  and  there  exists  at  least  one  i 
such  that  xi  >  yi  then  F(xj,  ...,  xq)  >  F(yi, ...,  yq). 

Another  condition  we  can  associate  with  F  is  a  symmetry  with  respect  to  the  arguments. 
That  is,  the  indexing  of  the  arguments  should  not  affect  the  answer.  This  symmetry  implies  a 

more  expansive  situation  with  respect  to  monotonicity.  Assume  tj, ...,  tq  and  t| , ...,  tq  are  two  sets 

A  * 

of  arguments  of  F,  Supj(a)  =  tj  and  Supj(a)  =  t] .  Let  perm  indicate  a  permutation  of  the 

arguments,  where  perm(i)  is  the  index  of  the  element  under  the  permutation.  Then  if  there 
exists  some  pennutation  such  that  tj  >  tpCrm(j)  for  all  i  we  get 

F(ti, ...,  tq)>F(ti,  ...,  tq). 

Let  us  look  further  into  this  framework.  A  source's  support  for  a  solution,  Supj(«)  depends 

upon  the  degree  of  compatibility  between  the  proposed  solution  a  and  the  value  provided  by  the 
source,  ap  Let  us  denote  Comp(a,  aj)  as  this  compatibility.  Thus  Supj(a)  is  some  function  of  the 
compatibility  between  a[  and  a.  Furthennore,  we  have  a  monotonic  type  of  relationship.  For  any 
two  values  a  and  b  if  Comp(a,  aj)  >  Comp(6,  aj)  then  Supj(a)  >  Supj(h). 

The  compatibility  between  two  objects  in  X  is  based  upon  some  underlying  proximity 
relationship.  The  concept  of  a  proximity  relationship,  which  we  introduced  earlier,  has  been 
studied  is  the  fuzzy  set  literature  [11,  15,  16].  Here  then  we  shall  assume  a  relationship  Comp, 
called  the  compatibility  relationship,  which  has  at  least  the  properties  of  a  proximity  relationship. 
Thus  Comp:  X  x  X  — »  T  in  which  T  is  an  ordered  space  with  greatest  and  least  elements  denoted 
1  and  0  and  having  the  properties:  (1)  Comp(x,  x)  =  1  and  (2)  Comp(x,  y)  =  Comp(y,  x).  A 
suitable,  although  not  necessary,  choice  for  T  is  the  unit  interval. 

We  see  that  this  framework  imposes  an  idempotency  type  condition  on  the  aggregation 
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process.  Assume  a [  =  a  for  all  i.  In  this  case  Comp(«,  a{)  =  1  for  all  i.  From  this  it  follows  that 
for  any  b  eX  Comp(«,  a{)  <  Comp(6,  a{)  hence  Supt(o)  >  Sup{(6)  for  all  b  thus  Sup(o)  >  Sup(6) 
for  all  a.  Thus  there  can  never  be  a  better  solution  than  a.  Furthermore,  if  F  is  assumed  strictly 
monotonic  and  Comp  is  such  that  Comp(a,  b)  ^  1  for  a  *  b  then  we  get  a  strict  idempotency. 

We  now  introduce  the  idea  of  a  solution  set  and  the  related  idea  of  minimal  solution  set. 
We  say  that  a  subset  G  of  X  is  a  solution  set  if  all  a  s.t.  Sup(«)  =  Maxaex[Sup(a)]  are  contained 

in  G.  We  shall  say  that  a  subset  H  of  X  is  a  minimal  solution  set  if  there  always  exists  one 
element  a  e  H  s.t.  Sup(a)  =  Maxaex[Sup(a)].  Thus  a  minimal  solution  set  is  a  set  in  which  we 

can  always  find  an  acceptable  fused  value. 


5.  Some  Common  Compatibility  Relationships 

We  now  look  at  some  very  important  special  examples  of  compatibility  relationships  and 
show  that  in  these  cases  there  exists  some  minimal  solution  sets  which  are  easily  definable  in 
terms  of  the  data  to  be  fused.  These  minimal  solution  sets  can  be  seen  as  boundaries  on  the  subset 
of  elements  of  X  in  which  we  look  to  find  a  solution.  These  boundaries  seem  to  reflect  a  general 
idea  that  the  aggregation  procedure  has  the  nature  of  a  mean  (averaging)  operator. 

The  first  situation  we  consider  is  a  very  strong  compatibility  requirement.  Here  we 
assume  Comp(a,  b)  =  1  if  a  =  b  and  Comp(a,  b)  =  0  if  a  ^  b.  This  is  a  very  special  kind  of 
equivalence  relationship,  elements  are  only  equivalent  to  themselves. 

Let  B  be  the  subset  of  X  containing  the  input  data  values,  B  =  {b|  s.t.  3  a[  =  b}.  Let  d  be 
some  element  not  in  B,  d  e  X  -  B.  In  this  case  Comp(d,  a{)  =  0  for  ap  Let  b  be  some  element 
from  B,  b  e  B.  Here  there  exists  at  least  one  aj  s.t.  Comp(b,  aj)  =  1.  Thus  we  see  that 
Comp(b,  aj)  >  Comp(d,  aj)  for  all  j  and  there  exists  at  least  one  i  such  that  Comp(b,  aj)  > 
Comp(d,  a{).  Hence  Sup{(b)  >  Supi(d)  for  all  i  and  at  least  one  Sup{(b)  >  Supt(d).  From  this  we 

see  that  Sup(b)  >  Sup(d).  Furthermore  if  F  is  strictly  monotonic  then  Sup(b)  >  Sup(d).  Since  a  is 
the  element  which  has  the  largest  Sup  then  an  implication  of  this  is  that  the  aggregated  value 
should  be  an  element  from  the  collection  of  input  values,  in  particular  a  e  B.  Here  B  is  a  minimal 
solution  set.  This  can  be  seen  as  a  kind  of  boundedness  condition  on  the  aggregated  value,  a 
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must  be  a  value  in  the  set  B  of  input  values.  This  is  of  course  a  very  natural  value.  We  note  that 
idempotency  is  also  assured  here  -  in  the  case  of  idempotency  all  the  elements  aj  =  a  and  hence 


B={a}. 

We  next  show  that  if  Comp  is  any  equivalence  relationship  then  B  still  provides  a  minimal 
solution  set-no  solution  can  be  better  than  some  element  in  B.  We  recall  if  Comp  is  an 
equivalence  relationship  then  for  each  x  e  X  there  exists  a  subset  Ex  of  X,  called  the  equivalence 

of  x,  such  that  Comp(x,  y)  =  1  for  all  y  e  Ex  and  for  any  z  g  Ex  we  have  Comp(x,  z)  =  0.  Let 
B*  =  |J  Ex,  the  union  of  all  equivalence  class  corresponding  to  elements  in  B.  If  z  e  X  -  B* 

xeB 

then  Comp(z,  aj)  =  0  for  all  i.  Since  for  any  b  e  B*  there  always  exists  some  aj  such  that 
Comp(b,  aj)  =  1  hence  Sup(b)  >  Sup(z).  Thus  there  never  exists  an  element  z  e  X  -  B*  better  than 
any  element  in  B*.  Furthermore  if  F  is  strictly  mono  tonic  then  any  element  in  B*  is  always  better 
than  any  z  e  X  -  B*.  Thus  we  need  never  look  for  a  solution  in  X  -  B*.  Furthennore  we  see  the 
following.  Let  a  e  B*  but  a  g  B,  thus  a  is  in  the  equivalence  class  of  some  element  in  B.  Let  a  e 
Xjj  where  b  e  B.  We  see  that  for  any  aj,  Comp(b,  aj)  =  Comp(a,  aj)  hence  Supj(b)  =  Supj(a). 


Thus  there  exists  no  element  better  than  some  element  in  B.  Then  B  is  a  minimal  solution.  Thus 


we  can  bound  our  search  for  solution  by  B. 

We  now  turn  to  another  type  of  compatibility  relationship  -  the  situation  in  which  there 

exists  some  linear  ordering  on  the  space  X  that  underlies  the  compatibility  relation.  In  particular 
we  let  L  be  a  linear  ordering  on  X.  We  shall  use  the  notationx  >  y  to  indicate  that  x  is  higher  in 


the  ordering  then  y.  We  now  require  that  our  compatibility  relation,  in  addition  to  being  reflexive 

and  symmetric,  be  such  that  the  closer  the  elements  are  in  the  ordering  L  the  more  compatible 

they  are.  Thus  we  have  that  if  x  >  y  >  z  then  Comp(x,  y)  >  Comp(x,  z).  We  say  this  connection 

L  L 

between  ordering  and  compatibility  is  strict  if  x  >y  >  z  implies  Comp(x,  y)  >  Comp(x,  z). 

L  L 

We  now  show  that  in  this  situation  there  also  exists  a  boundedness  condition.  Let 

a*  =  Maxj[aj],  a*  is  the  largest  element  of  the  input  values  with  respect  to  the  underlying 
L 

ordering  >.  Let  a*  =  Minitaj]  the  smallest  element  in  B  with  respect  to  the  ordering.  Then  we 


2 

At  times  we  shall  find  it  more  natural  to  use  the  term  larger  in  place  of  higher. 
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shall  see  that  we  can  always  find  a  solution  a  satisfying 

a*  >  a  >  a* 

L  L 

We  see  this  as  follows.  Assume  b  <  a*,  b  is  below  a*  in  the  ordering  L,  then  for  any  input  aj  we 

have  aj  >  a*  >  b  hence  Comp(aj,  a*)  >  Comp(aj,  b).  From  this  it  follows  that  Sup(a*)  >  Sup(b). 
L  L 

Thus  a*  is  at  least  as  good  as  any  smaller  b.  Similarly,  if  d  >  a*  then  for  any  input  data  aj  we 

L 

have  aj  >  a*  >  b  therefore  for  all  a[  we  have  Comp(aj,  a*)  >  Comp(a,  a).  From  this  we  we  get 
L  L 

that  Sup(a  )  >  Sup(d).  Thus  we  see  the  subset  of  X  lying  between  a*  and  a  ,  with  respect  to  the 

ordering  L,  provides  a  minimal  solution  set.  This  is  the  usual  boundary  condition  associated  with 
mean  aggregation  operators.  Thus  we  are  beginning  to  see  that  Agg  must  be  a  mean-like 
operator. 

Let  us  now  consider  the  situation  in  which  our  underlying  compatibility  relationship  is  a 

similarity  relationship  [12].  That  is,  we  have  the  additional  property  that  for  all  x.  y  and  z 

Comp(x,  z)  >  Maxy  [Comp(x,  y)  a  Comp(y,  z)]. 

An  important  feature  of  any  similarity  relationship  is  the  following  [12].  For  every  a  e  T  there 
exists  a  partitioning  of  X  into  classes  E“,  X  =  CJ  E“  and  Ef  nE“  =  0  for  i  ^  j,  such  that  for  all 

j 

x  and  y  e  E“  we  have  Comp(x,  y)  >  a  and  for  all  z  g  E“  and  x  e  Ej*  we  have  Comp(x,  z)  <  a. 

Let  B  be  our  collection  of  observations  B  =  CJ  {aj}.  Let  a  be  the  minimal  similarity 

i 

between  any  elements  in  B,  a  =  Min  [Comp(aj,  aj)].  Let  the  collection  of  subsets  Ej1  be  the 

aj,  aj  eB  J  J 

cx, 

a-level  partitioning  of  X.  This  exists  one  member  of  this  partitioning  containing  B.  Let  Ej*  be 

a  r?u 

the  member  of  the  partitioning  class  containing  B.  Here  then  B  c  Ej*  and  J  n  B  =  0  for  all  j  ^ 

cx,  •  •  • 

j*.  We  shall  now  show  that  Ej*  provides  a  minimal  solution  set.  We  see  this  as  follows.  If  z  e 

oc  cx  .  cx. 

Ej*  then  Comp(z,  y)  >  a  for  all  y  e  Ej*.  Since  B  c  Ej*  then  Comp(z,  aj)  >  a  for  all  input. 

OC  oc  •  cx. 

Assume  x  g  Ej*  then  Comp(x,  y)  <  a  for  all  y  e  Ej*.  Since  B  c  Ej*  then  Comp(x,  aj)  <  a  for  all 
i.  From  this  it  follows  Supj(z)  >  Supj(x)  and  hence  Sup(z)  >  Sup(x).  Thus  we  see  a  element  not 

OC  oc  oc 

in  Ej*  can't  score  better  then  an  element  in  Ej*,  thus  Ej*. provides  a  minimal  solution  set  when  our 
Comp  relationship  is  a  similarity  relationship. 

Finally  we  introduce  a  trivial  example  of  a  proximity  relationship.  We  shall  say  that  a 
proximity  relationship  has  a  wild  card  if  there  exists  an  element  x  e  X  having  the  property 
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Prox(x,  y)  =  1  for  all  y  e  X.  In  this  case  x  is  called  a  wild  card.  Clearly  if  a  proximity 
relationship  has  a  wild  card  it  is  always  a  best  fused  value,  it  provides  a  minimal  solution  set. 

6.  Some  Features  on  the  Function  F 

We  described  the  process  of  detennining  the  fused  value  to  a  data  collection  <a\,  ...,  aq> 

as  to  be  conceptually  implemented  by  the  following  process: 

1)  For  any  aeX  obtain  Supi(a)  =  Comp(a,  a^) 

2)  Evaluate  Sup(a)  =  F(Sup{(a), ...,  Supq(a)] 

3)  Select  as  fused  value  the  a  such  that  Sup(a)  =  Maxa<=X[Sup(a)] 

We  explicitly  made  two  assumptions  about  the  function  F.  The  first  was  that  we  assumed 
that  F  is  symmetric  with  respect  to  the  arguments,  the  indexing  of  input  information  is  not 
relevant.  The  second  assumption  we  made  about  F  was  that  it  is  monotonic  with  respect  to  the 
argument  its  values.  An  implicit  assumption  we  made  about  F  was  an  assumption  of 
pointwiseness.  Here  the  detennination  of  the  valuation  of  any  solution  a,  V(a)  depends  only  on  a, 
it  is  independent  of  any  of  other  possible  solutions.  This  property  imposes  the  condition  of 
indifference  to  irrelevant  alternatives,  a  requirement  that  plays  a  central  role  in  two  fundamental 
works  on  opinion  aggregation  Arrow's  impossibility  theorem  [17]  and  Nash's  bargaining  problem 
[18].  Specifically  this  condition  assures  us  that  the  inclusion  or  removal  of  possible  solutions, 
elements  from  X  doesn't  affect  the  relationship  between  other  solutions.  Essentially  this  assures 
that  if  Sup(x)  >  Sup(y)  then  this  will  not  change  if  we  add  or  remove  another  element  z  from  the 
space  X. 

There  exists  another  property  we  want  to  associate  with  F,  it  is  closely  related  to  the  idea 

of  self-identity  discussed  by  Yager  and  Rybalov  [19].  Assume  that  we  have  a  data  set 
<a] .  ...,  aq>  and  using  our  procedure  we  find  that  a  is  the  best  solution  Sup(a)  >  Sup(x)  for  all  x 

in  X.  Assume  now  that  we  are  provided  an  additional  piece  of  data  aq+i  for  which  aq+i  =  a,  the 

new  data  suggests  a  as  its  value.  Then  clearly  a  should  still  be  the  best  solution.  We  shall 
formalize  this  requirement.  In  the  following  we  let  a  and  a  be  two  solutions  and  let  cj  =  Comp(a, 

aj)  and  q  =  Comp(a,  aj).  We  note  that  if  aq+  ]  =  a  then  Cq+  ]  =  Comp(a,  aq+ 1 )  >  cj  for  all  i  =  1  to 
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q  and  Cq+j  >  q  for  i  =  1  to  q  +  1.  Using  this  we  can  more  formally  express  our  additional 
requirement  on  F.  If 

F(ci,  cq)  >  F(ci, cq) 

and  if  Cq+ 1  >  cj  for  j  =  1  to  q  and  Cq+i  >  cj  for  j  =  1  to  q  +  1  then  we  require  that 
F(cq,  Cq,  Cq+j)  >  F(cp  Cq,  Cq+j). 


Our  work  so  far  assumed  a  very  general  fonnulation  for  F.  If  we  consider  the  situation  in 


which  the  compatibility  relation  takes  its  values  in  the  unit  interval  [0,  1],  one  formulation  for  F 

q 


that  meets  all  our  required  conditions  is  the  sum  or  totaling  function,  F(xj,  X2,  ...Xq) 


xi- 


i  =  1 


q  q 

Using  this  we  get  Sup(a)  =  Supi(a)  =  Comp(a,  aj).  Thus  our  fused  value  is  the  element 

i  =  1  i  =  1 

that  maximizes  the  sum  of  its  compatibilities  with  the  input. 

What  becomes  clear  here  is  the  fused  value  depends  very  strongly  on  the  compatibility 
relationship.  Let  us  consider  the  special  situation  in  which  our  variable  takes  as  its  value  numbers 

and  the  compatibility  between  elements  is  directly  related  to  the  distance  between  the  elements. 
Here  then  Comp(x,  y)  =  1  -  ^X’  ^  where  Dmax  is  the  largest  distance.  Using  this  compatibility 


max 

q 


D  -  D(a  a )  ~ 

relationship  we  get  Sup(a)  =  max  - The  fused  value,  a,  is  the  value  that 


i  =  1 


EX 


maximizes 


max 


Sup(a).  Here  this  is  the  value  a  that  minimizes  D(a, aj).  If  D(a.  aj)  is  taken  as  the  square 

i  =  1 

q 

difference,  D(a.  aj)  =  (a  -  aj)2,  then  it  is  easy  to  show  that  a  =  ^  aj,  is  simply  the  average. 


i  =  1 


Thus  we  see  using  the  average  of  the  data  values  is  a  special  case  under  particular  assumptions  for 
F  and  Comp.  We  should  point  out  that  this  assumption  about  the  compatibility  relationship  may 
not  always  be  the  appropriate  assumption  and  hence  the  use  of  the  average  may  not  always  be 
appropriate. 


7.  On  Monotonicity  in  Ordinal  Spaces 

We  again  return  to  the  case  in  which  the  data  is  drawn  from  a  set  which  has  an  ordering 
that  is  used  to  generate  the  proximity  relationship.  In  the  preceding  sections  we  showed  that  in 
this  case  our  fused  value  a  must  be  bounded  Minj[aj]  <  a  <  Maxj[aj].  Let  us  now  look  a  condition 
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of  monotonicity  of  aggregated  value  usually  associated  with  this  environment:  if  aj  >  aj  for  all  j 
then  this  mono  tonicity  property  requires  that  Agg(aj,  aq)  >  Agg(a^,  aq).  We  shall  see  that 
our  formulation  doesn't  impose  this  condition.  This  is  as  it  should  be  in  a  general  formulation. 
For  this  generality  allows  us  to  include  "mode-like"  type  fusion  methods  which  are  known  not  to 
be  monotonic  in  the  sense  described  above. 

Assume  the  data  set  has  been  indexed  such  that  a]  <  &2  <  ...  <  aq.  Let  its  fused  value  be  a. 
Let  aj  be  another  collection  of  q  values  such  that  aj  =  aj  for  all  j  ^  k  and  having  >  a^.  Thus  all 
the  elements  in  aj  are  equal  to  those  in  aj  except  the  k^1.  Data  value  monotonicity  requires  us  to 
show  that  if  Agg(aj,  ...,  aq)  =  a  then  a  can  not  be  less  than  a.  Let  us  see  what  is  needed  to  assure 
this  requirement. 

It  appears  that  one  feature  that  leads  to  difficulty  in  attaining  this  type  of  monotonicity  are 
proximity  relationships  that  manifest  saturation.  We  refer  to  figure  2  to  understand  this  idea. 


+ 


+ 


+ 


+ 


X ,  X  X  X , 

1  2  3  4 

Figure  2.  Illustration  of  Saturation 

If  Prox(xi,  X3)  <  Prox(x2,  X3)  and  Prox(xj,  X4)  =  Prox(x2,  X4)  then  we  have  saturation. 

Here  X4  has  moved  so  far  away  from  x  |  and  X2  that  their  proximity  to  X4  has  become  about  the 

same.  To  illustrate  how  this  saturation  effect  interferes  with  the  attaimnent  of  monotonicity 

consider  the  following  illustration  (figure  3  a). 

x  y 

~4— • — a - 1 - | - 


1 


a2  a  3 


Figure  3a.  Effect  of  Saturation  on  Monotonicity 

In  the  above  let  a\,  &2  and  a3  be  three  pieces  of  data  and  let  x  and  y  be  two  proposed  fused  values. 

Assume  here  that  the  F  and  the  Prox  relationship  are  such  that  y  is  a  better  solution  then  x.  This  is 
essentially  based  on  the  fact  that  y  is  closer  to  the  data  points  a2  and  33.  Now  consider  figure  3b. 


x  y 

I  •  0 


+ 


1  2  a  3 

Figure  3b.  Effect  of  Saturation  on  Monotonicity 

In  figure  3b  we  have  increased  the  value  of  ay.  If  this  increase  is  of  such  a  nature  that  it 
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eliminates  any  distinction  between  the  compatibility  of  a3  with  x  or  y  then  comparison  of  x  and  y 
as  possible  fused  values  is  simply  based  on  there  relationships  with  a}  and  a2-  Since  x  is  closer  to 
a  j  then  y  is  to  a2  the  possibility  arises  for  x  to  be  a  better  choice  for  fused  value  the  y.  Thus  we 
see  that  in  the  face  of  this  saturation,  an  increase  in  data  value  can  result  in  a  decrease  in  fused 
value. 

Partitioning  type  equivalence  relationships  are  particular  examples  of  this  type  of 

saturating  proximity  relationships.  Let  X  be  an  ordered  space  x\  >  X2  >  ...  >  xn.  Assume  Prox  is 
an  equivalence  relationship  in  which  Aj  are  the  equivalence  classes,  X  =  Aj.  Let  x\  <  xjq  < 

x^-  Assume  xj  and  xjq  e  Aj  and  x^  e  A3.  Then  Prox(x j,  xjq)  =  1  and  Prox(xj,  x^)  =  0, 
Prox(x]q,  x^)  =  0.  Assume  now  xjq  increase  to  x^  e  A2.  Now  we  Prox(xj,  xj^)  =  0, 
Prox(x]<2?  x^)  =  0  and  Prox(x^,  xj^)  =  0. 

As  we  already  noted  this  lack  of  universally  requiring  monotonicity  with  respect  to  data 
values  allows  us  to  include  different  types  of  fusion  techniques  in  our  framework.  A  notable 
example  of  a  non-monotonic  type  fusion  operator  is  the  mode.  Consider  the  case  where  we  have 
observations  10,  10,  10,  15,  15,  15,  15.  The  mode  is  15.  Assume  a  data  set  10,  10,  10,  15,  15,  20, 
20.  We  have  increased  two  of  the  fifteens  to  20.  Our  mode  becomes  10. 


8.  Credibility  Weighted  Sources 

In  the  preceding  we  have  implicitly  assumed  all  the  data  had  the  same  credibility.  Here 
we  shall  consider  the  situation  in  which  each  data  has  a  credibility  weight  wj  (our  input  are  q  pairs 
of  (wj,  aj)).  We  also  note  that  the  weight  wj  must  be  drawn  from  a  scale  that  has  at  least  an 

ordering.  In  addition  we  assume  this  scale  has  minimal  and  maximal  elements  denoted  0  and  1 . 

Again  in  this  situation  for  any  aeXwe  calculate  Sup(a)  =  F(Sup  ]  (a),  ...,  Supq(a))  where 
Supj(a)  is  the  support  for  a  from  the  data  supplied  by  source  i,  (wj,  aj).  However  in  this  case, 
Supj(a)  depends  upon  two  components.  The  first  being  the  compatibility  of  a  with  aj,  Comp(a,  aj) 

and  the  second  being  the  weight  or  strength  of  credibility  source  i.  Thus  in  this  case 

Supi(a)  =  g(wj,  Comp(a,  aj)) 

Ideally  we  desire  that  both  wj  and  Comp(a,  aj)  be  drawn  from  the  same  scale,  which  has  at  least 
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an  ordering.  For  the  following  discussion  we  shall  not  implicitly  make  this  assumption. 
However,  we  shall  find  it  convenient  to  use  0  and  1  to  indicate  the  least  and  greatest  element  on 
each  of  the  scales.  We  now  specify  the  properties  that  are  required  of  the  function  g.  A  first 
property  we  require  of  g  is  monotonicity  with  respect  to  both  of  the  arguments:  g(x,  y)  >  g(z,  y)  if 
x  >  z  and  g(x,  y)  >  g(x,  w)  if  y  >  w.  Secondly  we  assume  that  zero  credibility  or  zero 
compatibility  results  in  zero  support:  g(x,  0)  =  g(0,  y)  =  0  for  all  x  and  y.  We  see  that  g  has  the 
character  of  an  "and"  type  operator.  In  particular  at  a  semantic  level  we  see  that  we  are  essentially 
saying  is: 

"source  i  provides  support  for  a  solution  if  the  source  is  credible  and  the  solution  is  compatible 

with  the  sources  data". 

With  this  we  see  that  g(l,  1)  =  1  and  g(x,  y)  ^  0  if  x  ^  0  and  y  ^  0.  We  must  make  one  further 

observation  about  this  process  with  respect  to  source  credibility.  Any  source  that  has  zero 
credibility  should  in  no  way  effect  the  decision  process.  Thus  if  ((w),  a^),  ...,  (wq,  aq))  has  as  its 
fused  value  a  then  the  data  ((w),  a^),  ...,  (wq,  aq),  (wq+),  aq+i))  where  Wq+i  =  0  should  also 
have  the  same  result.  With  this  understanding  we  can  discard  any  source  with  zero  credibility.  In 
the  following  we  shall  assume  unless  otherwise  stated  all  sources  have  non-zero  credibility. 

We  now  show  that  the  boundary  conditions  also  hold  in  this  case  where  the  sources  have 

weights.  First  let  our  situation  be  one  in  which  our  Comp  relationship  is  strict,  it  is  such  that 

Comp(a,  b)  =  1  if  a  =  b  and  Comp(a,  b)  =  0  if  a  ^  b.  Again  let  B  =  >>  {a{},  the  set  of  all  the 

i 

values  provided  by  the  sources.  If  a  £  B  then  for  all  i  we  have  Comp(a,  aj)  =  0  and  from  the 
above  requirements  we  get  Supt(a)  =  0.  Let  beB  then  Comp(b,  a{)  =  1  for  all  i  such  that  a[  =  b. 
Hence  Comp(b,  aj)  >  Comp(a,  aj)  for  all  i  and  therefore  Supj(b)  >  Supj(a)  for  all  i  and  hence 

Sup(b)  >  Sup(a).  Therefore  we  can  always  find  the  fused  solution  in  B. 

In  a  similar  way  we  can  show  that  when  the  proximity  relationship  is  based  on  an  ordering 
and  the  input  data  have  weights  it  is  the  case  the  fused  value  must  lie  between  the  largest  and 
smallest  input  values.  The  justification  of  this  is  based  on  the  monotonicity  of  g  with  respect  to 
the  compatibility.  In  particular  if  Min^a)]  =  a}  and  if  a  <  a}  then  Comp(a,  aj)  <  Comp(ap  aj)  for 

all  j  hence  g(wj,  Comp(a,  aj))  <  g(wj,  Comp(ai,  aj)  and  hence  Supj(a)  <  Supj(a^)  for  all  j.  In  a 
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similar  way  we  can  shown  that  if  Maxj[aj]  =  aq  and  if  a  >  aq  then  Supj(a)  <  Supj(aq)  for  all  j. 


9.  On  the  Inclusion  of  Reasonableness 

In  an  earlier  part  we  introduced  the  idea  of  reasonableness  and  indicated  its  importance  in 
the  data  fusion  process.  At  a  meta  level  we  mean  to  use  this  idea  to  introduce  any  information, 
exclusive  of  the  data  provided  by  the  sources,  we  may  have  about  the  value  of  variable  of  interest. 
The  information  we  have  about  reasonableness  will  affect  the  fusion  process  in  at  least  two  ways. 
First  it  will  interact  with  the  data  provided  by  the  sources.  In  particular,  the  weight  (credibility) 
associated  with  a  source  providing  an  unreasonable  input  value  should  be  diminished  and  hence 
its  importance  in  the  fusion  process  reduced.  Secondly  some  mechanism  should  be  included  in 
the  fusion  process  to  block  unreasonable  values  from  being  provided  as  the  fused  value. 

A  complete  discussion  of  the  issues  related  to  the  representation  and  inclusion  of 
reasonableness  in  the  data  fusion  process  is  complex  and  beyond  our  immediate  aim  as  well  as 
beyond  our  complete  understanding  at  this  time.  In  many  ways  the  issue  of  reasonableness  goes 
to  the  very  heart  of  intelligence.  We  leave  a  more  expansive  discussion  of  this  concept  to  some 
future  work.  Here  we  shall  focus  on  the  representation  of  some  very  specific  type  of  knowledge 
and  its  role  in  introducing  a  consideration  of  reasonableness  in  the  fusion  process. 

Information  about  the  reasonableness  of  values  of  a  variable  of  interest  can  be  either 
pointed  or  diffuse.  By  pointed  we  mean  information  specifically  about  the  object  while  diffuse 
information  is  about  objects  of  a  class  in  which  our  object  of  interest  lies.  Generally,  pointed 
information  has  a  possibilistic  nature,  while  diffuse  information  is  probabilistic.  Here  we  consider 
the  situation  in  which  our  infonnation  about  reasonableness  is  pointed  and  captured  by  a  fuzzy 
subset,  a  mapping  R:  X  — >  T  and  thus  has  a  possibilistic  nature.  Here  for  any  x  e  X,  R(x) 
indicates  the  reasonableness  (or  possibility)  that  x  is  a  solution  of  variable  of  interest.  For 
example,  if  our  interest  is  to  obtain  a  person's  age  and  before  soliciting  data  from  external  sources 
we  know  that  the  person  is  young  then  we  can  capture  this  infonnation  with  R  and  thus  constrain 
the  values  that  are  reasonable  [20,  21].  In  the  above  we  assume  T  is  a  linear  ordering  having 
maximal  and  minimal  elements,  usually  denoted  1  and  0. 


21 


Let  us  see  how  we  can  include  this  information  into  our  data  fusion  process.  Assume  the 
data  provided  by  source  i  is  denoted  aj  and  wj  is  the  credibility  assigned  to  source  i.  We  shall 

assume  these  credibilities  are  measured  on  the  same  scale  as  the  reasonableness.  In  the  fusion 
process  the  importance  weight,  up  assigned  to  the  data  a[  should  be  a  function  of  the  credibility  of 

the  source, wp  and  the  reasonableness  of  the  data,  R(aj).  An  unreasonable  value,  whatever  the 
credibility  of  the  source,  should  not  be  given  much  consideration  in  the  fusion  as  is  the  case  for  a 
piece  of  data  coming  from  a  source  with  low  credibility  whatever  the  reasonableness  of  its  data. 
Using  the  Min  to  implement  this  "anding"  we  obtain  uj  =  Min[R(aj),  wj]  as  the  importance  weight 
assigned  to  the  data  a[  coming  from  this  source.  In  this  environment  the  infonnation  that  goes  to 
the  fusion  mechanism  is  the  collection  <(ui,  a^),  ...,  (uq,  aq)>. 

As  in  the  preceding,  the  support  for  a  proposed  fused  value  a,  should  be  a  function  of  its 
support  from  input  data: 

Sup(a)  =  F(Supi(a), ...,  Supq(a)) 

The  support  provided  from  source  i,  for  solution  a,  should  depend  on  the  importance  weight  uj 
assigned  to  data  supplied  by  source  i  as  well  as  the  compatibility  of  the  data  a^  and  the  proposed 

fused  value,  Comp(a,  aj).  In  addition,  we  must  also  include  information  about  the  reasonableness 

of  the  proposed  solution  a.  For  a  solution  a  to  get  support  from  source  i  it  should  be  compatible 
with  the  data  a[  and  compatible  with  what  we  consider  to  be  reasonable,  Comp(a,  R);  we  let 

Compj(a)  =  Comp(a,  aj)  a  Comp(a,  R).  Furthermore,  Comp(a,  R)  =  R(a)  and  Compj(a)  = 
Comp(a,  aj)  a  R(a).  In  addition,  as  we  have  indicated,  the  support  afforded  any  solution  by 

source  i  should  be  determined  in  part  by  the  importance  weight  assigned  i.  Taking  these 
considerations  into  account,  we  get  Supj(a)  =  g(uj,  Compj(a)).  Substituting  our  values  we  get: 

Supj(a)  =  g(wj  a  R(aj),  Comp(a,  aj)  a  R(a)) 

What  is  clear  is  that  g  should  be  monotonically  increasing  in  both  its  arguments  and  be  such  that 
if  any  of  the  arguments  are  0  then  Supj(a)  =  0.  In  the  case  in  which  we  interpret  g  as 
implementing  an  anding  and  using  the  Min  operator  as  our  and  we  get  Supj(a)  =  wj  a  Rf aj )  a 
R(a)  a  Comp(a,  aj).  Here  we  observe  that  the  support  afforded  from  source  i  to  any  proposed 


fused  solution  is  related  to  the  credibility  of  the  source,  the  reasonableness  of  value  provided  by 

the  source,  the  reasonableness  of  the  proposed  fusion  solution  and  the  compatibility  of  the  data 
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and  solution. 


Let  us  see  how  the  introduction  of  reasonableness  affects  our  results  about  boundedness 
and  minimal  solution  sets. 

Consider  the  case  in  which  our  underlying  proximity  relationship  is  very  precise, 
Comp(x,  y)  =  1  iff  x  =  y  and  Comp(x,  y)  =  0  x  ^  y.  Let  B  be  the  set  of  input  values  and  let  B  be 
the  subset  of  B  such  that  b  e  B  if  R(b)  ^  0.  If  a  g  B  then  Comp(a,  aj)  =  0  for  all  aj  and  hence 
Supj(a)  =  0  for  all  i.  Let  deB-B,  here  R(d)  =  0  and  again  we  get  that  Supj(d)  =  0  for  all  i.  On 
the  other  hand  for  b  e  B  then  R(b)  ^  0  and  b  =  aj  for  some  j  and  hence  Supj(b)  >  0.  Thus,  we  see 

/s 

that  we  will  always  find  our  solution  in  the  space  B,  the  set  of  data  values  that  are  not  completely 
unreasonable. 

/s 

Consider  now  the  case  in  which  Prox  is  an  ordinary  equivalence  relation.  Again  let  B  be 

our  set  of  input  data  which  have  some  degree  of  reasonableness.  Let  Ej  be  the  equivalence  class 

of  aj,  for  all  y  e  Ej,  Prox(y,  aj)  =  1.  Let  E  =»  Ep  the  union  of  all  equivalence  classes  that  have 

i 

input  value.  If  a  £  E  then  Prox(a,  aj)  =  0  for  all  i.  From  this  we  see  that  if  a  g  E  then  Supj(a)  =  0 


for  all  i  and  hence  we  can  always  find  at  least  as  good  a  solution  in  E.  We  can  obtain  a  further 
restriction  on  the  minimal  solutions.  Let  Dj  c:  Ej  be  such  that  dj  e  Dj  if  R(dj)  =  Maxxep-(R(x)). 

Thus,  Dj  is  the  subset  of  elements  that  are  equivalent  to  aj  and  are  most  reasonable.  For  any  dj  e 

Dj  and  any  Cj  e  Ej  we  have  that  for  all  input  data  aj,  Comp(ep  aj)  =  Comp(dp  aj).  Since  R(dj)  > 

R(ej)  we  see  that  Supj(dj)  >  Supj(ej)  for  all  j.  Hence  dj  is  always  at  least  as  good  a  fused  value  as 
any  element  in  Ep  Thus,  we  can  always  find  a  fused  solution  in  D  =  »  Dp  Furthermore,  if  x 

and  y  e  Dj  then  R(x)  =  R(y)  and  Comp(x,  z)  =  Comp(y,  z)  for  all  z.  Hence  Supj(x)  =  Supj(y)  and 

Sup(x)  =  Sup(y).  The  result  is  that  we  can  consider  any  element  in  Dj.  Thus  all  we  need  consider 

is  the  set  D=  »  {di}  where  dj  is  any  element  from  Dj.  We  note  that  if  aj  e  Dj  then  this  is  of 
i 

course  the  preferred  element. 

We  now  consider  the  case  where  the  proximity  relationship  is  based  on  a  linear  ordering  L 
over  space  X.  Let  B  be  the  set  of  data  values  provided  by  the  sources.  Let  x*  and  x*  be  the 


maximal  and  minimal  elements  in  B  with  respect  to  the  ordering  L.  Let  H  be  the  set  of  xj  so  that 

x*  >  xj  >  x*.  In  the  preceding  we  showed  that  we  can  always  find  a  fused  value  element  a  in  H. 
L  L 


23 


We  now  show  that  the  introduction  of  reasonableness  removes  this  property. 

In  the  preceding  we  indicated  that  for  any  proposed  fused  value  we  get  that 
Supj(a)  =  g(uj,  Comp j (a))  where  g  is  monotonic  in  both  the  arguments,  uj  =  wj  a  R(aj)  and 
Compj(a)  =  R(a)  a  Comp(a,  aj).  We  shall  now  show  that  here  we  can  have  an  element  a  g  H  in 
which  Supj(a)  >  Supj(b)  for  all  b  e  H.  This  implies  that  we  can't  be  guaranteed  of  finding  the 

fused  value  in  H.  Consider  now  the  case  in  which  there  exists  b  e  H  for  which  R(b)  <  a.  In  this 
case  Supi(b)  =  g(up  R(b)  a  Comp(b,  ap))  <  g(up  a)  Let  a  ?  Hbe  such  that  R(a)  >  a.  For  this 
element  we  get  Supj(a)  =  g(uj,  R(a)  a  Comp(a,  aj)).  If  Comp(a,  aj)  >  a  then  R(a)  a  Comp(a,  aj) 
=  P  then  p  >  a  and  hence  Supj(a)  =  g(uj,  P)  >  g(uj,  a)  =  Supj(b)  and  then  it  is  not  true  we  can 
eliminate  a  as  a  solution.  Thus  we  see  that  the  introduction  of  this  reasonableness  allows  for  the 
possibility  of  solutions  not  bounded  by  the  largest  and  smallest  of  input  data. 

An  intuitive  boundary  condition  can  be  found  in  this  situation.  Again  let  H  be  the  subset 

of  X  bounded  by  our  data:  H  =  {x|  x*  >  x  >  x*}  where  let  a*  =  R(x*)  and  let  a*  =  R(x*).  Let 

L  L 

H*  =  {x|  x  >x*  and  R(x)  >  R(x*)}  and  H*  =  {x|x  >x^  and  R(x)  >  R(x*)}.  Here  we  can  restrict 
L  L 

ourselves  to  looking  for  the  fused  value  in  the  set  H  =  H  u  H*  u  H  .  We  see  that  as  follows.  For 

any  x  >x*  we  have,  since  the  proximity  relationship  is  induced  by  the  ordering,  that  Comp(x,  aj) 
L 

<  Comp(x*,  a{)  for  all  data  aj.  If  in  addition  we  have  that  R(x)  <  R(x*)  then  Sup)(x)  =  g(uj,  R(x) 

a  Comp(x,  aj))  <  Sup)(x*)  =  g(uj,  R(x*)  a  Comp(x*,  aj))  for  all  i  and  hence  Sup(x)  <  Sup(x*). 

Thus  we  can  eliminate  all  x  >x*  having  R(x)  <  R(x*).  Using  similar  arguments  we  can  eliminate 

L 

x  >xH.  which  have  R(x)  <  R(x+). 

L 

10.  Granular  Objects  as  Fused  Value 

In  the  preceding  sections  we  considered  the  situation  in  which  we  were  required  to  find,  as 
the  fused  value,  some  solution  that  was  an  element  of  the  set  X.  We  now  look  at  the  situation  in 
which  we  allow  our  solution  to  be  some  subset  of  X.  The  use  of  subsets  as  our  fused  value  is  an 
example  of  what  Zadeh  [9]  calls  granulation.  For  simplicity  we  initially  will  not  include  any 

considerations  of  reasonableness  and  credibility. 

Again  assume  V  can  take  its  value  in  X.  Let  our  data  be  the  collection  a^  for  i  =  1  to  q. 

These  are  the  values  we  desire  to  fuse.  Here  we  assume  the  existence  of  a  proximity  relationship 
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Prox  on  X  x  X  where  Prox(x,  y)  takes  it  value  in  an  ordered  space  T  which  for  simplicity  in  the 
following  we  shall  assume  to  be  the  unit  interval. 

Let  A  be  any  subset  of  X.  Using  the  approach  introduced  in  the  preceding  we  define  the 
support  for  A  as  the  fused  value 

Sup(A)  =  F(Supi(A),  Sup2(A),  Supq(A)) 

As  in  the  preceding  we  then  select  as  our  fused  value  the  subset  A  having  largest  value  for  Sup. 

Here  Supj(A)  is  the  support  for  A  from  data  aj  it  is  the  compatibility  of  aj  with  A  which 

we  obtain  as 

Supi(A)  =  Maxye A[Comp(y,  aj)], 

the  maximal  compatibility  of  aj  with  any  element  in  A.  We  note  that  if  A  is  a  fuzzy  subset  we  can 
express  this  as 

Supj(A)  =  MaxyeX[A(y)  A  Comply,  aj)] 

Recalling  that  Comply,  aj)  is  the  support  for  y  from  aj  we  can  express  Supj(A)  as 

Supj(A)  =  MaxyeX[A(y)  a  Supj(y)] 

The  extension  of  this  situation  to  include  considerations  of  credibility  of  the  data,  wj  and  a 

reasonableness  function  over  X  is  rather  straightforward.  Here  we  recall 

Supj(y)  =  g(uj ,  R(y)  a  Comply,  aj)) 
where  uj  =  wj  a  R(aj).  Using  this  we  can  get 

Supj(A)  =  MaxyeX[A(y)  a  g(uj,  R(y)  a  Comply,  aj))] 

If  we  further  assume  that  g  is  implemented  using  an  anding, 

g(uj,  R(y)  a  Comply,  aj))  =  uj  a  R(y)  a  Comply,  aj) 

we  get  that 

Supj(A)  =  uj  a  MaxyeX[A(y)  a  R(y)  a  Comply,  aj)] 

In  the  following  discussion  we  shall  initially  neglect  considerations  of  credibility  and 
reasonableness,  we  assume  wj  =  1  for  all  i  and  R(x)  =  1  for  all  x.  In  this  case  Supj(A)  = 
Maxyex[A(y)  A  Comply,  aj)]. 

Usually  when  we  allow  subsets  as  the  fused  value  there  are  some  constraints  on  which 

subsets  we  can  use.  First  however,  let  us  consider  the  situation  in  which  we  have  no  explicitly 

stated  restriction  on  which  subsets  we  can  use  as  the  fused  value.  Let  B  be  the  subset  of  all  the 
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data,  B  =  {ap  aq}.  Here  Supi(B)  =  MaxyGB[Comp(y,  a{)]  and  since  a[  e  B  then  Supi(B)  =  1 
for  all  i.  From  this  it  follows  that  for  any  other  subset  A  of  X,  Supj(B)  >  Supj(A)  hence  Sup(B)  > 

Sup(A).  Thus,  if  we  have  no  restrictions  on  which  subsets  we  can  use  as  the  fused  value,  the  best 
solution  is  always  the  subset  consisting  of  all  the  input  data  values.  We  note  that  even  in  the  case 
of  introducing  differing  credibilities  associated  with  the  sources  the  set  B  is  still  the  best  answer. 
This  however  may  not  be  the  case  when  we  consider  reasonableness. 

Let  us  now  turn  to  the  more  usual  situation  in  which  there  are  some  constraints  on  which 
subsets  we  allow  as  the  fused  value.  Sometimes  these  constraints  are  explicit,  other  times  they 
may  implicit. 

A  number  of  methods  can  be  described  for  introducing  meaningful  constraints  on  the 
allowable  subsets.  One  approach  is  to  supply  the  system  with  a  collection  of  subsets  of  X  from 
which  it  must  select  the  fused  value.  We  call  this  the  case  of  a  user  supplied  vocabulary.  In  this 

case  if  A  =  {Ap  ...,  Am}  is  the  user  supplied  vocabulary  we  then  select  as  our  fused  value  subset 

A*  e  A  such  that  Sup(A*)  =  Max  [Sup(Aj]].  We  pick  the  term  (subset)  in  the  vocabulary  with 

Aj  (E A  J 

the  largest  support.  Here  considerable  use  can  be  made  of  fuzzy  set  theory  and  Zadeh's  related 
idea  of  computing  with  words  [22]  to  provide  a  mechanism  for  representing  linguistic  concepts  in 
terms  of  sets.  A  prototypical  example  of  this  situation  is  one  in  which  the  Aj  are  a  collection  of 

fuzzy  subsets,  corresponding  to  linguistic  terms  related  to  the  variable  V.  For  example,  if  V  is  age 
then  these  could  be  terms  like  old,  young,  middle  age. 

Another,  and  more  general,  approach  to  restricting  the  subsets  available  as  outputs  of  the 
fusion  process  is  to  use  a  measure  on  space  of  subsets  of  X  to  indicate  our  constraints.  In  this  case 
our  measure  is  a  mapping  p:  2^  -A  [0,  1],  Here  for  any  subset  of  A  of  X,  p(A)  indicates  the 

o 

degree  to  which  it  is  acceptable  to  the  user  to  provide  A  as  the  fused  value.  We  can  denote  this 
measure  as  the  Client  Acceptability  Measure  (CAM). 

With  the  availability  of  such  a  measure  we  can  proceed  in  the  following  manner.  We 
calculate  the  support  for  A  by  source  i  as 


3We  could  define  p  over  the  set  of  fuzzy  subsets  p:I^  — >  [0,  1], 


26 


Supi(A)  =  p(A)  a  Maxy[A(y)  a  Supi(y)] 

where  as  in  the  preceding,  Supj(y)  =  g(wj  a  R(aj),  Comp(y,  aj)  a  R(y)).  Thus  p(A)  bounds  the 
support  available  from  any  source. 

If  we  neglect  the  reasonableness,  assume  R(x)  =  1  for  all  x  e  X,  and  assume  wj  =  1  then 
Supj(A)  =  p,(A)  a  Maxy[A(y)  a  Comp(y,  aj)] 

Let  us  look  at  some  natural  features  of  a  measure  p  which  is  used  to  convey  the 
acceptability  of  considering  a  subset  A  as  the  fused  value.  One  characteristic  of  a  set  that  may  be 
important  in  determining  its  appropriateness  as  a  fused  value  is  its  size  or  cardinality.  Since  the 
fewer  the  number  of  elements  in  a  subset,  the  more  infonnative  (useful)  it  is  as  a  fused  value,  it  is 
natural  to  consider  smaller  sets  more  acceptable  then  larger  sets.  This  observation  is  reflected  in 
the  general  feature  that  p(A)  >  p(D)  if  A  c;  D.  A  related  property  of  this  measure  is  that  any 
subset  consisting  of  a  singleton  should  be  completely  acceptable,  p({x})  =  1. 

In  applying  these  CAM’s  in  the  manner  described  above  we  are  essentially  trying  to  reflect 
some  criteria  or  requirements  that  a  user  has  with  respect  to  how  they  will  use  the  fused  value. 
Often  these  criteria  reflect  some  operational  or  cognitive  need  of  the  user.  There  are  two 
attributes  associated  with  a  subset  which  can  be  used  to  help  in  the  expression  of  these  criteria 
within  a  CAM. 

In  general,  when  we  allow  subsets  as  fused  values,  we  prefer  them  to  contain  elements  that 

are  consistent  (similar)  rather  then  a  collection  of  diverse  values.  In  order  to  capture  this  feature 

of  granularization  we  can  make  use  of  the  proximity  relationship.  Specifically,  some  indication  of 

the  internal  compatibility  of  the  elements  in  A  can  be  used  to  capture  this  consideration  in  the 

determination  of  p(A).  To  express  this  notion  we  can  suggest  using  as  the  internal  compatibility 
of  the  subset  A,  the  formulation  I-Comp(A)  =  Min  [Comp(x,  y)].  Thus,  here  we  take  the 

x,y  CEA 

smallest  compatibility  of  any  two  elements  in  A  as  its  internal  compatibility.  Thus,  I-Comp(A) 
can  be  used  to  help  in  the  formulation  of  p(A)  to  aid  in  capturing  this  notion  of  consistency.  In 
the  most  basic  application  of  this,  we  can  define  p(A)  =  I-Comp(A).  We  note  that  if  A  c;  D  then 
I-Comp(A)  >  I-Comp(D).  If  A  is  a  singleton  set,  A  =  {x},  since  Comp(x,  x)  =  1  for  all  x  then  I— 
Comp(A)  =  1 . 
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In  the  above  we  implicitly  assumed  that  A  was  a  crisp  set.  In  the  case  where  A  is  fuzzy 
we  can  define 

I-Comp(A)  =  Min  [(A(x)  a  A(y))  v  Comp(x,  y)] 

x,y  CEA 

where  A(x)  =  1  -  A(x).  Other  more  sophisticated  definitions  of  the  notion  of  internal 
compatibility  of  a  set  be  can  be  obtained  with  the  use  of  soft  computing  technologies  such  as 
fuzzy  modeling  [23] 

The  second  feature  of  a  subset  that  can  be  used  in  the  formulation  of  the  CAM  to  help 
reflect  the  user  desires  is  the  cardinality  of  a  subset.  As  we  have  noted  this  can  help  capture  the 
fact  that  users  typically  prefer  smaller  sets  to  bigger  sets.  Here  we  shall  not  pursue  this  topic  but 
only  indicate  that  considerable  use  can  be  made  of  Zadeh's  fuzzy  set  based  idea  of  computing  with 
words  to  relate  information  about  the  cardinality  of  a  set  A  and  its  value  p(A).  For  example,  we 
can  capture  a  user's  desire  that  the  fusion  set  contain  only  a  few  elements.  In  this  case  we  can, 
using  Zadeh's  idea  of  linguistic  quantities  [24],  represent  only  a  few  as  a  fuzzy  subset  Q  of  non¬ 
negative  integers  in  which  Q(l)  =  1  and  Q(x)  >  Q(y)  if  x  <  y.  Then  for  any  A,  Q(Card  A)  can  be 
used  to  indicate  p(A). 

The  construction  of  p(A)  must  take  into  account  the  preferences  of  the  user  as  well  as  the 
structure  of  the  underlying  proximity  relationship.  Consider  the  situation  in  which  Prox  is  an 
equivalence  relations.  In  this  case  we  see  I-Comp(A)  =  0  if  there  exists  x  and  y  e  A  from 
different  equivalence  classes.  Thus  I-Comp(A)  =1  iff  A  is  contained  in  an  equivalence  class, 
otherwise  it  is  zero.  In  this  situation  it  would  appear  that  the  construction  of  p  using  I-Comp(A) 
is  not  appropriate.  Here,  when  the  proximity  is  an  equivalence  relationship  it  is  best  to  construct 
p  based  on  some  function  of  the  cardinality  of  A. 

We  point  out  that  the  description  of  a  user  supplied  vocabulary  can  be  made  with  the  use 
of  a  CAM  p  in  which  all  words  in  the  vocabulary  have  p(A)  =  1  and  all  those  not  in  the 
vocabulary  have  p(A)  =  0.  In  some  situations  we  may  describe  our  desired  fused  sets  using  both  a 
user  supplied  vocabulary  as  well  as  criteria  based  on  I-Comp(A)  and/or  the  cardinality  of  A.  An 
important  example  of  this  arises  in  the  case  where  our  proximity  relationship  is  based  on  a  linear 
ordering  L  on  X.  Here  we  may  require  that  our  fused  subsets  be  intervals.  We  recall  that  A  is 
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an  interval  if  there  exists  some  x  and  z  e  X  such  that  A  =  {y|  x  >  y  >  z} .  In  this  case  we  may 

L  L 

also  associate  with  each  interval  a  value  P2(A)  which  depends  on  I-Comp(A)  and  a  value  P3(A) 

which  depends  on  Q(Card  A).  Then  we  use 

h(A)  =  Pl  (A)  a  p2(A)  a  p3(A) 

where  pi(A)  =  1  if  A  is  an  interval  and  pi(A)  =  0  if  A  is  not  an  interval. 


11.  Multiple  Granular  Objects  as  Fused  Values 

In  the  following  we  shall  make  some  other  observation  regarding  the  presentation  of  fused 
information.  Initially  we  addressed  the  problem  of  selecting  an  element  from  X  as  our  fused 
value.  We  then  considered  the  situation  in  which  we  allowed  our  fused  value  to  be  some  subset  of 
X.  In  this  case  our  fused  value  is  essentially  multiple  elements  from  X.  Here  we  introduced  a 
CAM,  measure  p  on  X,  to  reflect  the  users  desires  with  respect  to  acceptability  of  different 
subsets 

We  will  now  take  this  one  step  further  by  considering  the  situation  in  which  we  allow  as 
the  multiple  subsets  of  X  as  the  fused  value.  For  example,  multiple  intervals  or  multiple  user 
vocabulary  words.  Here  a  basic  consideration  is  how  many  subsets  can  we  use.  In  order  to  do 
this  we  must  introduce  an  additional  measure  r\.  Let  C  be  a  set  whose  elements  are  subsets  of  X, 
if  P  is  the  power  set  of  X,  C  is  a  subset  of  P.  We  define  rpP  — »  [0,  1],  where  for  any  C, 
r|(C)  ~  Q(Card(C)).  That  is  r|(C)  depends  on  the  cardinality  of  C,  the  number  of  subsets  of  X  in  C. 
Thus  if  q  is  any  integer  Q(q)  indicates  the  acceptability  of  providing  q  subsets.  Here  we  require 
Q(  1)  =  1  and  generally  we  expect  Q(i)  >  Q(j)  if  i  <  j. 

Using  this  we  can  now  express  the  support  of  any  subset  C  of  the  power  set  as 

Sup(C)  =  F(Supi(C),  ...,  Supi(C)) 

Let  us  determine  the  support  for  C  from  i.  We  can  express  this  as: 

Supi(C)  =  r|(C)  a  MaxAeC[Supi(A)] 

Thus  Supj(C)  depends  upon  the  support  from  source  i  for  all  the  subsets  of  X  that  are  in  C.  Since 
we  have  already  provided  a  way  for  determining  Supj(A)  we  are  able  to  detennine  Supj(C).  We 
note  that  if  r|(C)  =  0  for  Card(C)  >  1  we  then  have  effectively  reduced  this  to  a  case  where  we 
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only  allow  one  subset. 
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PART  II.  FUZZY  METHODS  IN  WEB  QUESTION  ANSWERING 
SYSTEMS 

1.  Introduction 

The  wide  spread  availability  of  the  internet  has  generated  considerable  interest  in 
electronic  sources  of  infonnation.  Various  types  of  software  applications  have  been  developed  to 
support  this  interest.  We  can  generically  refer  to  these  as  Applications  Useful  for  Seeking 
Information.  (AUSIN).  One  widely  used  class  of  these  AUSIN  is  search  engines.  These 
applications  retrieve  pointers  to  pages  or  files  based  on  their  matching  some  key  words  specified 
by  the  user.  Another  widely  used  class  of  AUSIN  is  data  base  systems.  In  some  ways  database 
applications  and  search  engines  are  opposite  extremes  of  these  AUSIN.  Database  applications 
give  precise  infonnation  however,  they  only  work  in  highly  structured  environments.  The  search 
engines  while  only  giving  imprecise  responses  in  the  form  of  pointers  are  capable  of  functioning 
in  highly  unstructured  almost  chaotic  environments.  However,  neither  of  these  two  applications 
typically  have  a  reasoning  capacity. 

Another  class  of  AUSIN  is  question  answering  systems  [1-6].  An  important  dimension 
along  which  question  answering  systems  differ  from  search  engines  and  most  databases  is  in  their 
reasoning  ability  [6].  A  fundamental  characteristic  of  question  answering  systems  is  its  ability  to 
reason  over  its  infonnation  base.  This  facility  leads  to  a  fundamental  difference  in  the  nature  of 
their  response  to  queries.  In  particular  the  response  from  a  search  engine  is  a  pointer  to  a 
document  (file  or  web  page)  resident  in  its  library4.  The  database  responds  essentially  by 
providing  some  value  already  resident  in  the  database.  A  question  answering  system,  because  of 
its  reasoning  capacity,  can  construct  new  knowledge  that  is  not  resident  in  its  knowledge  base  in 
response  to  a  query.  While  currently  less  pervasive  then  either  of  the  other  information  seeking 
systems,  question  answering  systems  with  reasoning  ability  have  the  potential  of  being  much 


4We  use  the  word  library  very  broadly  in  that  it  could  mean  a  subspace  of  the  internet 
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more  useful  with  regards  to  there  responding  to  the  real  desires  of  the  human  user. 

Intelligent  question-answering  systems  generally  require  a  gathering  of  information 
relevant  to  a  question  to  be  answered  and  then  a  processing  of  (reasoning  over)  this  information. 
The  processing  often  requires  some  appropriate  representation  of  the  information  to  be  used.  The 
general  chaotic  nature  of  the  Internet  makes  the  dynamic  unrestrained  gathering  and  subsequent 
representation  of  information  very  difficult.  It  often  requires  levels  of  understanding  well  beyond 
our  current  capability  for  computational  intelligence5.  The  manipulation  of  appropriately 
gathered  and  represented  knowledge  is  more  within  our  current  grasp. 

Our  point  of  departure  in  this  work  will  be  the  availability  of  a  domain  knowledge  base 
that  has  been  built,  with  the  aid  of  possible  human  interaction,  using  knowledge  obtained  from  the 
internet  as  well  as  local  domain  knowledge.  We  do  not  preclude  the  system  from  dynamically 
interacting  with  appropriate  Internet  sources  in  response  to  queries.  We  can  envision  knowledge 
bases  of  this  type  being  available  within  many  organizations. 

While  the  types  of  questions  envisioned  to  be  processed  by  future  question  answering 
systems  is  wide  and  varied  in  this  work,  we  restrict  ourselves  to  questions  related  to  finding  the 
value  of  some  attribute  variable  using  the  information  in  the  knowledge  base. 

Our  interest  in  this  work  is  on  the  development  of  tools  to  help  in  the  representation  of 
various  types  of  knowledge  as  well  as  on  the  development  of  a  framework  and  mechanism  for 
reasoning  and  manipulating  the  knowledge  in  response  to  users  questions. 

Our  approach  will  be  in  the  spirit  of  Zadeh's  paradigm  of  computing  with  words  [7,  8].  In 
providing  a  schema  for  representing  and  locally  manipulating  knowledge  we  shall  rely  heavily  on 
the  fuzzy  set  based  theory  of  approximate  reasoning  [9]  and  particularly  the  idea  of  protoforms 
[10,  11].  Considerable  use  will  be  made  of  knowledge  trees  [12,  13]  to  provide  a  global 
framework  for  structuring  and  directing  the  process  of  answering  a  question  posed  to  the  system. 
As  we  shall  subsequently  see  a  knowledge  tree  takes  the  contents  of  the  knowledge  base  and 
structures  it  in  a  manner  to  answer  the  question  presented.  These  trees  can  be  seen  as  a 
mobilization  of  the  knowledge  base  to  address  a  particular  task.  At  a  meta  level  the  knowledge 

5 

This  is  to  say  nothing  about  the  important  problem  of  verifying  the  quality  of  the  information. 
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tree  provides  a  global  plan  for  solving  the  problem,  the  answering  of  the  question.  It  is  worth 
noting,  although  we  shall  not  pursue  it  here,  a  knowledge  tree  can  be  used  to  help  point  directions 
for  going  outside  our  knowledge  base  for  information  to  improve  our  answer. 


2.  Basic  Concepts  of  Approximate  Reasoning 

The  primary  elements  in  an  Approximate  Reasoning  (AR)  representation  of  a  knowledge 
base  are  a  collection  of  attribute  variables,  Vj  for  j  =  1  to  n,  called  the  atomic  variables.  These 

attribute  variables  are  the  objects  of  interest  in  the  current  context.  It  is  information  about  the 
value  of  these  variables  and  relationships  between  them  that  constitutes  the  knowledge  of  interest. 
Associated  with  each  variable  Vj  is  a  set  Xj,  indicating  the  allowable  values  for  the  variable.  The 
terms  base  set,  domain  and  universe  of  discourse  are  used  to  indicate  the  set  Xj.  A  joint  variable 
is  any  tuple  of  one  or  more  distinct  atomic  variables;  V3,  (V2,  V5)  and  (Vl,  V2,  V6)  are 
examples  of  joint  variables.  Associated  with  any  joint  variable  is  a  base  set  consisting  of  the 
Cartesian  product  of  the  domains  of  the  individual  variables  making  up  the  joint  variable.  Thus,  if 
a  joint  variable  has  q  components,  its  domain  is  a  set  of  q-tuples.  It  is  implicitly  assumed,  unless 
otherwise  stated,  that  the  variables  of  interest  can  assume  only  one  value  in  its  base  set  (see  [14] 
for  a  discussion  of  different  variable  types). 

A  categorical  proposition  (or  statement)  in  theory  of  approximate  reasoning  is  of  the  form 

V  is  A,  here  V  is  a  joint  variable  and  A  is  a  fuzzy  subset6  of  the  domain  of  V.  A  proposition 
involving  only  one  variable  is  called  a  canonical  proposition  while  those  involving  two  or  more 
variables  are  called  relational  or  joint  propositions.  A  large  variety  of  types  of  knowledge  can  be 
represented  by  propositions  of  the  above  form. 

A  proposition  in  AR  is  viewed  as  imposing  constraints  on  the  possible  values  of  the 
variables  involved.  For  example,  if  A  is  a  crisp  subset  then  the  meaning  of  the  proposition 

VI  is  A  is  to  indicate  that  the  value  for  the  variable  Vl  is  restricted  to  be  a  member  of  the  set  A, 


We  shall  follow  the  convention  of  expressing  a  fuzzy  subset  A  of  X  as  {: 

x  s  X 


A(x), 
~x“~ J 
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that  is  elements  in  A  are  the  only  possible  values  for  V  i .  The  use  of  fuzzy  subsets  provides  for  a 
grading  of  this  idea.  If  B  is  a  fuzzy  subset,  then  one  meaning  of  the  proposition  Vi  is  B  is  that  for 
anyx  e  Xi,  B(x)  indicates  the  possibility  that  Vi  =  x.  If  V  =  (Vj,  V2,  V3)  is  a  joint  variable  then 
the  meaning  of  the  proposition  V  is  M  is  that  for  any  (xj,  X2,  X3)  e  X|  x  X2  x  X3,  M(x  ] ,  X2,  X3) 
is  the  possibility  that  V\  =  x\,  V2  =  x2,  and  V3  =  X3.  Thus,  we  see  that  the  knowledge 
represented  by  the  propositions  is  essentially  contained  in  the  fuzzy  subsets  associated  with  the 
propositions. 

We  shall  say  a  proposition,  V  is  M,  is  nonnal  (consistent)  if  Maxx[M(x)]  =  1,  then  there 

exists  at  least  one  element  in  the  base  set  of  V  that  has  possibility  one.  We  shall  call  a  proposition 
subnormal  if  Maxx[M(x)]  <  1.  We  call  a  proposition  V  is  M  a  tautology  if  M(x)  =  1  for  all  x  in 

the  domain  of  V.  It  should  be  noted  that  a  tautology  existentially  provides  no  restriction  of  the 
value  of  a  variable  and  thus  induces  no  new  information  other  than  that  the  variable  must  be  in  its 
base  set. 

When  using  the  AR  framework  we  are  generally  involved  in  one  of  three  tasks.  The  first 
task,  translation  or  knowledge  representation  deals  with  the  process  of  taking  knowledge  normally 
expressed  in  natural  language  and  converting  it  into  an  appropriate  representation  within  the 
framework  of  AR.  We  construct  our  knowledge  base  via  this  translation  process.  The  second 
task  is  extracting  information  from  the  knowledge  base.  This  task  is  called  the  inference  process. 
It  involves  the  manipulation  of  propositions  to  obtain  other  propositions.  The  third  task  is  that  of 
retranslation,  where  we  are  interested  in  converting  the  propositions  obtained  via  the  inference 
process  into  statements  of  natural  language.  Here  we  shall  mainly  focus  on  the  second  task, 
inference,  and  to  a  lesser  extent  on  the  translation  task.  In  [8]  we  have  addressed  the  retranslation 
problem. 

The  basic  operations  used  for  knowledge  manipulation  within  the  AR  framework  are 
conjoin,  containment  and  negation.  We  now  look  at  these  operations. 

The  conjoin  operation  provides  the  system  with  the  facility  for  combining  or  fusing 

information.  In  the  translation  process  it  allows  us  to  represent  knowledge  involving  multiple 

variables  and  allows  us  to  construct  joint  variables  from  atomic  variables.  The  inference  process 

uses  this  operation  to  combine  individual  pieces  of  knowledge.  The  operation  of  conjoin  is  a 

35 


generalization  and  unification  of  the  set  operations  of  conjunction  and  cartesian  product.  It  is  also 
closely  related  to  the  join  operation  used  in  the  databases. 

Definition:  Assume  Va  and  Vb  are  two  joint  variables  on  the  universes  X  and  Y  respectively.  Let 
Va  is  D  and  Vb  is  E  be  two  propositions.  Their  conjoin  (conjunction)  denoted  Va  is  D  xVb  is  E  is 

the  proposition  V  is  F.  Here  V  is  a  joint  variable  consisting  of  the  union  of  the  atomic  variables 
making  up  Va  and  Vb-  F  is  a  fuzzy  subset  of  the  domain  of  V,  Z,  such  that  for  each  z  e  Z  we 

have 

F(z)  =  D(x)  a  E(y) 

where  x  is  the  portion  of  z  corresponding  to  the  domain  X  and  y  is  the  portion  corresponding  to 
the  domain  Y.  The  operator  a  is  the  Min,  although  more  generally  it  can  be  a  t-norm  [15]. 

In  the  case  when  the  two  variables  being  conjoined  are  the  same  the  conjoin  operation 
reduces  to  the  usual  intersection  of  fuzzy  sets: 

Va  is  D  x  Va  is  E  =  Va  is  F  where  F  =  D  n  E 

In  this  situation  we  shall  sometimes  find  it  convenient  to  use  n  in  place  of  x.  In  the  case  when 
the  two  joint  variables  being  conjoined  have  no  common  variables  this  operation  reduces  to  the 
cartesian  product:  Va  is  D  x  Vb  is  E  =  (Va,  Vb)  is  F  where  F  =  D  x  E. 

We  now  define  a  special  conjoin  operation.  This  operation  plays  a  role  in  making 
propositions  which  are  not  necessarily  about  the  same  variable  be  about  the  same  variable. 
Definition:  Assume  Va  and  Vb  are  two  joint  variables  such  that  Vb  contains  all  the  variables  that 
are  in  Va.  The  cylindrical  extension  of  the  proposition  Va  is  F  to  the  proposition  Vb  is  F°  is 
defined  by 

Vb  is  F°  =  Va  is  F  x  V|  is  X  x  V2  is  X2  x . x  Vq  is  Xq 

where  Vl,  V2, ,  Vq  are  the  atomic  variables  in  Vb  that  are  not  in  Va  and  the  Xfs  are  the  base 

sets  of  these  variables. 

Note:  We  can  equivalently  express  this  cylindrical  extension  of  Va  is  F  to  Vb  by 
Vb  is  F°  =  Va  is  M  x  Vb  is  X 
where  X  is  the  domain  of  the  variable  Vb- 

The  membership  function  F°  can  be  obtained  from  the  membership  function  of  F  by 

setting  F°(y)  =  F(x)  where  x  is  the  tuple  in  the  base  set  of  Va  that  corresponds  to  the  portion  of  y 
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in  this  subspace. 

We  now  turn  to  the  second  basic  operation  used  in  approximate  reasoning,  containment. 
At  a  very  fundamental  level  this  operation  provides  us  with  a  facility  for  ordering  propositions  in 
approximate  reasoning. 

Definition:  Assume  Va  is  D  and  Vb  is  E  are  two  propositions,  we  say  that  Va  is  D  contains 
Vb  is  F  denoted  Vb  is  E  cVa  is  D  if 

F°(z)  >  D°(z)  for  all  z 

where  F°  and  D°  are  the  cylindrical  extensions  of  F  and  D  to  the  base  set  of  variable  V,  the  union 
of  the  atomic  variables  in  Va  and  Vb- 

Note  1:  If  Va  and  Vb  are  the  same  variables  then  this  is  the  definition  of  fuzzy  containment 
suggested  by  Zadeh  [16]. 

Note  2:  It  can  be  shown  that  if  Pi  and  P2  are  two  propositions  and  P3  =  Pi  x  P2  then 

P3  c=  Pi  and  P3  c:  P2- 

Containment  allows  us  to  define  equivalence  of  propositions. 

Definition:  Two  propositions  Pi  and  P2  are  equivalent  if  Pi  c:  P2  and  P2  c=  Pi,  we  shall  denote 
this  as  Pi  =  P2 

Note:  Equivalent  propositions  are  interchangeable  under  conjoin,  assume  Pi  and  P2  are 
equivalent,  Pi  =  P2,  and  let  P4  be  any  other  proposition  then  Pi  x  P4  =  P2  x  P4 

The  third  basic  operation  in  AR  is  negation.  While  the  other  two  operations  involve  more 
than  one  proposition  involves  only  one  proposition. 

Definitions:  Assume  V  is  A  is  a  proposition  where  the  domain  is  X.  The  negation  of  V  is  A, 
denoted  not  (V  is  A)  is  the  proposition  V  is  A,  A(x)  =  1  -  A(x)  for  each  x. 

An  important  relationship  exists  between  negation  and  containment 
Theorem:  If  Va  is  AcVb  is  B  then  Vb  is  BcVa  is  A 

Proof:  If  Va  is  A  c:  Vb  is  B  then  for  all  z  in  the  extensions  A°(z)  <  B°(z),  this  implies 
A°(z)  <  B°(z)  and  the  result  follows. 

We  shall  here  introduce  an  operation  which  is  based  upon  a  combination  of  conjoin  and 
negation. 

Definition:  Assume  Va  is  A  and  Vb  is  B  are  two  propositions.  The  operation  _L  is  such  that 
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Va  is  A  _L  Vb  is  B  =  not  (Va  is  A  xVb  is  B)  =  (Va is  Ax  Vb  is  B). 

Note:  Va  is  A  _!_  Vb  is  B  can  be  expressed  as  V  is  D  where  V  is  the  union  of  the  atomic  variables 
in  the  joint  variables  Va  and  Vb  and  D  is  a  fuzzy  subset  on  the  universe  of  V  such  that 


D(z)  =  A(x)  v  B(y), 

where  v  is  the  Max  operator,  x  is  the  component  of  the  tuple  z  corresponding  to  the  domain  of  Va 
and  y  is  a  component  of  the  tuple  z  corresponding  to  the  domain  of  Vb- 

Note:  If  Va  and  Vb  are  the  same  the  operator  _L  is  the  union.  With  this  in  mind  we  shall  use 
tJ  synonymously  with  _L. 

The  preceding  operations  are  essentially  logical  operations.  We  now  introduce  another 
operation  that  will  play  an  important  role  in  the  process  of  question-answering  is  the  Zadeh 
extension  principle  [14,  16-18]. 

Assume  X  and  Y  are  two  crisp  subsets  and  let  f  be  a  mapping  from  X  into  Y,  f:  X  — >  Y. 

A(x) 

Here  for  each  x  e  X,  we  have  f(x)  =  y  e  Y.  Assume  A  =  LJ  { — — }  is  a  fuzzy  subset  of  X 

x  e  X  X 


using  the  Zadeh  extension  principle,  we  can  define  f(A)  as  a  fuzzy  subset  of  Y  such  that 

f(A)=  u  {4§r! 


X 


f(x) 


We  see  the  Zadeh  extension  principle  allows  us  to  extend  operation  on  elements  of  a  set  to  act  on 
fuzzy  subsets 


As  a  simple  illustration  assume  X  =  {1,  2,  3}  and  Y  =  {a,  b,  c,  d,  e}.  Let  f:  X  — »  Y  be 

defined  as  f(l)  =  a,  f(2)  =  e  and  f(3)  =  b.  Let  A  =  {-j-,  %^-,  %L}then  f(A)  =  {— ,  ^-} . 

12  3  a  e  b 

Note  1:  It  can  be  shown  that  if  we  denote  B  =  f(A)  then  B(y)  =  Max  [A(x)] 

all  x,  s.t 
f(x)=y 


Note  2:  Often  the  extension  principle  is  used  when  the  spaces  X  and  Y  are  the  same.  For  example 


if  f  is  y  =  2x. 

The  Zadeh  extension  principle  can  be  generalized.  Let  Xi,  X2,  ....,  Xn  and  Y  be  a 
collection  of  sets.  Assume  f  is  a  mapping  f:  X^  x  X2  x  ...  x  Xn  — >  Y.  That  is  for  each  tuple 
(xi,  ....,  xn)  e  Xj  x  X2  x  ...  x  Xn  we  have  f((xi, ....,  xn)  =  yeY.  Let  A-  be  a  subset  of  Xi.  The 

extension  principle  allows  for  the  evaluation  of  f(Ai, ....,  An).  Denoting  F(Ai,  ....,  An)  =  B  where 
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B  is  a  fuzzy  subset  of  Y  we  have  B  =  hj  ,  Y1  in[A,(x, )] ,  t^ug  wc  |.iavc 

x1;  ...  xn  f(Xl’  -  xn) 

B(y)  =  Max  [Ai(x)  a  Ai(x)  a . a  An(xn)] 

all  xi,...,  xn  s.t 

f(xt>  »■,  xn)  =  y 

This  form  of  the  extension  principal  allows  us  to  define  arithmetic  operations  on  fuzzy 

numbers.  Assume  R  is  the  set  of  real  numbers.  Addition  is  a  mapping  from  R  x  R  into  R.  That  is 

for  each  x,  y  e  R  wc  have  f(x,  y)  =  z  where  x  +  y  =  z.  If  A  and  B  are  two  fuzzy  numbers  their 
sum  C  =  A  +  B  is  a  fuzzy  number  so  that  C  =  yj  A(x)  a  B(x)^  p[crc  we  see  for  any  z  e  r 

all  (x,  y)  X  +  y 

C(z)  =  Max  [A(x) A  B(y)] 

all  (x,  y)  s.  t. 
x  +  y  =  z 

More  generally  if  T  is  any  binary  arithmetic  operation  then  with  A  and  B  fuzzy  numbers 


we  have  A  T  B  =  C  where  C  is  also  a  fuzzy  number  such  that  C  =  yj  [- 


A(x)  a  B(x) 


all  (x,  y)  x  1  y 

Equivalently  for  all  z  e  R  we  have  C(z)  =  Max  [A(x)  a  B(y)].  Here  _L  can  be  addition, 

all  (x,  y)  s.  t. 
x  _L  y  =  z 

subtraction,  multiple,  division,  exponentiation.  It  also  can  be  the  maximum  or  minimum  of  two 
numbers. 

We  shall  provide  another  fonn  of  the  extension  principle  involving  fuzzy  subsets  In 

anticipation  of  providing  this  form  of  the  extension  principle  we  introduce  the  idea  of  level  sets. 

Let  A  be  a  fuzzy  subset  of  X  the  a  level  set  of  A,  denote  Aa,  is  a  crisp  subset  of  X  defined  as 

Aa  =  {x/  A(x)  >  a}.  Again  let  X  at  Y  be  two  crisp  subset  and  assume  G:  2X  — »  Y,  it  is  a 

mapping  from  subsets  of  X,  into  Y.  We  can  extend  G  to  map  fuzzy  subsets  of  X.  In  particular. 
G(A)  =  hj  {  a  } .  Here  G(A)  is  a  fuzzy  subset  of  Y 

m  n  ^(Aa) 

a  e  [0,  1] 


3.  Semantics  of  AR  the  Knowledge  Representation 

An  important  task  in  the  use  of  the  AR  framework  in  question-answering  systems  is  the 
translation  of  our  knowledge  from  statements  in  a  natural  language  into  propositions  in  AR.  Here 
we  first  provide  some  basic  understanding  of  the  semantics  of  our  fonnal  representation.  The 
most  basic  translation  rule  is  the  assignment  of  a  value  to  an  atomic  variable.  An  example  of  this 
occurs  if  we  have  the  information  that  John  is  young.  We  first  represent  the  concept  young  as  a 
fuzzy  subset  A  and  then  assign  it  to  constrain  the  value  of  the  variable  V,  Johns  age.  Using  this 
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representation  we  are  saying  that  there  is  some  uncertainty  with  regard  to  our  knowledge  of  the 
value  of  V  and  A(x)  is  the  possibility  that  V  =  x. 

Some  specific  cases  of  A  are  worth  noting.  If  A  =  {x*},  a  singleton,  then  V  is  A  conveys 
the  fact  that  x*  is  the  exact  value  of  V.  This  is  equivalent  to  the  ordinary  statement  V  =  x*. 
Another  special  case  is  when  A  is  a  crisp  set.  In  this  case,  the  statement  V  is  A  is  effectively 
indicating  that  V  is  an  element  in  A,  V  e  A.  Another  important  example  of  this  is  when  A  =  X. 
Here  we  are  saying  that  A  can  be  any  value  in  X.  This  corresponds  to  the  case  when  we  have  no 
knowledge  about  V  other  then  its  domain.  It  is  clear  that  this  corresponds  to  a  least  restrictive 
constraint  and  can  always  be  assumed. 

We  note  that  if  Maxx[A(x)]  <  1  then  there  is  no  element  in  X  having  a  possibility  of  one  of 
being  the  value  of  V.  It  is  an  indication  of  some  conflict  between  our  knowledge  and  the 
assumption  that  V  must  assume  its  value  in  X.  The  extreme  case  is  when  A  =  0,  here  no  element 
has  any  possibility  of  being  the  value  of  X.  The  infonnation  out  of  which  the  knowledge  base  is 
constructed  is  usually  assumed  to  involve  fuzzy  subsets  that  are  normal,  Maxx[A(x)]  =  1.  The 

conflicts  generally  arise  as  a  result  of  the  process  of  combining  pieces  of  knowledge. 

Consider  the  statement  V  is  A,  where  for  ease  of  explanation  we  assume  A  is  crisp.  As  we 
indicated,  this  statement  is  saying  that  our  knowledge  of  V  is  that  its  value  lies  in  the  subset  A.  If 
we  know  that  the  value  of  V  lies  in  the  subset  A  then  if  A  c  B  we  also  know  that  the  value  of  V 
lies  in  B.  For  example,  if  we  know  that  John  is  in  his  twenties  then  we  can  infer  that  he  is  over 
fifteen  years  old.  This  can  be  seen  as  a  kind  of  basic  inference  process.  We  call  this  process 
entailment.  While  we  are  able  to  go  from  smaller  subsets  to  larger  subsets  with  certainty  we  can't 
go  the  other  way  around,  from  larger  subsets  to  smaller  subsets  with  the  same  certainty.  For 
example,  knowing  John  is  in  his  twenties  doesn’t  allow  us  to  conclude  that  John  is  between  2 1  and 
23. 

Consider  now  that  we  have  two  pieces  of  knowledge  about  the  same  variable;  V  is  A  and 

V  is  B.  Again  for  simplicity  of  discussion  we  shall  assume  A  and  B  are  crisp  subsets.  The  first 

statement  indicates  that  the  possible  value  for  V  lies  in  the  set  A  and  the  second  statement  says 

that  the  possible  value  for  V  lies  in  the  set  B.  Consider  the  situation  where  one  statement  says 

John  is  between  10  and  20  and  the  other  says  that  he  is  over  15.  In  this  case  we  can  conclude  that 
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he  is  between  15  and  20.  More  generally  in  the  case  of  multiple  pieces  of  information  we  take  the 
conjunction;  that  is,  we  conclude  V  is  D  where  D  =  A  n  B.  As  we  shall  subsequently  see  the 
processes  of  conjunction  and  entailment  form  the  basis  of  reasoning. 

As  we  have  indicated,  statements  of  the  form  V  is  A  allows  for  a  representation  which  can 
capture  uncertainty  in  our  infonnation.  A  number  of  measures  have  been  introduced  to  help 
quantify  different  aspects  of  the  uncertainty  associated  with  these  propositions.  One  of  these  is 
the  concept  of  specificity  of  a  fuzzy  subset.  In  [19]  Yager  describes  its  properties  and  a  number 
of  possible  formulations  for  quantifying  it.  Assume  A  is  a  fuzzy  subset  of  X  and  let  x*  be  an 
element  with  maximal  membership  grade  in  A,  A(x*)  =  Maxx[A(x)],  and  let  Averx*£X[A(x)]  be 

the  average  of  all  membership  grades  in  A  excluding  x*,  then  we  define  the  specificity  of  A  as 
Sp(A)  =  A(x*)  -  Averx*£X[A(x)].  We  see  some  properties  of  this  measure:  1.  Sp(A)  =  1  if  A 

consists  of  exactly  one  element,  A  =  {x}.  2.  Sp(0)  =  Sp(X)  =  0.  3.  If  A  and  B  are  two  normal 
fuzzy  subsets  of  X,  then  Sp(A)  >  Sp(B)  if  A  c  B. 

One  application  of  the  concept  of  specificity  is  that  it  provides  a  measure  of  the  amount  of 
information  contained  in  a  proposition  in  AR.  Thus  if  P,  is  a  proposition  expressible  as  V  is  A 
then  the  amount  of  infonnation  contained  in  P,  Inf(P)  =  Sp(A).  An  important  tool  in  AR  is  the 
principle  of  Minimal  Specificity  [20].  This  states  that  if  we  have  a  number  of  different  possible 
representations  of  some  piece  of  knowledge,  then  we  should  choose  the  one  with  the  minimal 
specificity.  We  see  this  as  being  akin  to  the  principle  of  maximal  entropy  in  probability  theory. 
Another  related  use  of  specificity  is  that  a  property  of  a  valid  inference  is  that  it  never  increases 
specificity.  Thus,  if  we  infer  P2  from  Pi,  Pi  I— >  P2,  then  it  must  be  the  case  that 
Sp(Pl)  >  Sp(P2).  That  is  inference  doesn’t  allow  us  to  increase  infonnation. 

We  now  turn  to  another  measure  of  uncertainty.  Assume  V  is  A  and  V  is  B  are  two 
propositions  in  AR.  We  define  the  possibility  of  V  is  A  given  V  is  A,  denoted  Poss[V  is  A/V  is 
B]  as  Maxx[F(x)]  where  F  =  A  n  B.  Possibility  measures  the  degree  to  which  two  propositions 
have  some  solution  in  common. 

A  related  measure  is  the  certainty  measure.  The  certainty  of  V  is  A  given  V  is  B,  is 
defined  as  Cert[V  is  A|  V  is  B]  =  1  -  Poss  [V  is  A|  V  is  B].  If  B  is  nonnal  then  Poss[V  is  A  |  V  is 
B]  >  Cert[V  is  A  I  V  is  B].  If  B  is  a  singleton,  B  =  {x*}  then  Poss[V  is  A  I  V  is  B]  =  Cert[V  is  A 
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I  V  is  B]  =  A(x*) 

We  note  that  Poss  is  a  measure  of  the  degree  of  intersection  of  A  and  B  and  Cert[A|B]  is  a 

measure  of  the  degree  to  which  B  is  contained  in  A.  An  important  interpretation  of  the  measures 
of  possibility  and  certainty  are  respectively  as  the  upper  and  lower  bounds  of  the  truth  of  the 
statement.  Va  is  A  given  the  knowledge  that  Vb  is  B. 

4.  Translation  of  Information 

An  important  step  in  the  construction  of  question-answer  systems  is  the  translation  of  our 
knowledge  from  statements  in  a  natural  language  into  propositions  in  AR.  A  study  of  Zadeh's 
[21,  22]  work  will  provide  a  large  collection  of  translation  rules.  Here  we  discuss  some  of  these. 

As  we  have  already  indicated,  the  basic  statement  involves  the  association  of  a  fuzzy 
subset  with  a  variable.  Here  we  shall  see  how  to  build  more  complex  representations  using  this 
basic  building  block. 

Assume  P  is  a  statement  that  has  representation  as  Va  is  A  and  Q  is  a  statement  with 
representation  as  Vb  is  B.  Using  these  we  obtain  the  following  translation  rules. 

•  not  (P)  is  represented  as  Va  is  A  where  A  is  the  negation  of  A,  A(x)  =  1  -  A(x) 

•  P  and  Q  is  represented  as  Va  is  A  x  Vb  is  B.  In  the  special  case  where  Va  =  Vb  we  have 
Va  is  A  n  B.  In  another  special  case  where  Va  and  Vb  are  disjoint  we  get  (Va?  Vb)  is  D  where 

D(x,y)  =  Min(A(x),  B(y)).  Another  case  which  will  be  useful  is  where  we  have  partial  over  lap 
that  is  Va  =  (Vi,  V2)  and  Vb  =  (Vi).  If  we  let  X  and  Y  respectively  be  the  domains  of  Vl,  and 
V2  then  we  get  (Vl,  V2)  is  E  when  E(x,y)  =  Min[A(x,y),  B(x)] 

•  P  or  Q  is  translated  as  Va  =  A  _L  Vb  is  B.  In  the  special  case  where  Va  =  Vb  we  get 
(Va,  Vb)  is  D  where  D  =  A  u  B,  D(x)  =  Max[A(x),  B(x)].  In  the  case  when  Va  and  Vb  are 
disjoint  we  get  Va«E  where  E(x,y)  =  Max(A(x),  B(y)] 

•  If  P  then  Q  is  represented  as  Va  is  A  _L  Vb-  In  the  special  case  where  Va  and  Vb  are 
disjoint  we  get  (Va?Vb)  is  E  where  E(x,  y)  =  Max[A(x),  B(y)] 

In  the  preceding  we  have  discussed  the  representation  the  kinds  of  knowledge  whose 
structure  is  very  close  to  the  class  of  propositions  appearing  in  classical  logic.  The  knowledge 
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that  can  be  represented  in  the  AR  framework  does  not  have  to  be  restricted  to  these  kinds  of 
classical  examples.  More  generally  the  AR  paradigm  of  translating  knowledge  into  statements 
constraining  the  values  of  the  relevant  variables  can  be  applied  to  a  large  variety  of  knowledge.  A 
simple  illustration  of  this  involves  the  knowledge  that  Faye  and  Lotfi  are  close  in  age.  If  Va 
represents  the  variable  Faye's  age  and  if  Vb  represents  the  variable  Lotfi's  age  then  we  represent 
this  knowledge  as  (Va,  Vb)  is  CLOSE.  Here  CLOSE  is  a  fuzzy  relationship  such  that  for  each 

pair  of  ages  x  and  y  the  membership  grade  CLOSE(x,  y)  indicates  the  degree  to  which  x  and  y 
satisfy  the  condition  of  being  close. 

An  important  class  of  knowledge  involves  functional  relationships.  Many  scientific 
disciplines  represent  their  knowledge  in  this  manner.  In  addition,  modern  technological  tools 
such  as  data  mining  use  functional  relationships  to  express  the  knowledge  they  discover  [23]. 
Tools  are  available  for  translating  functional  relationships  into  propositions  in  AR.  Let  us  look  at 
some  of  these. 

Let  V  and  U  be  two  disjoint  variables  with  domains  X  and  Y.  A  functional  relationship, 
U  =  f(V)  is  a  mapping  f:  X  — »  Y  such  that  y  =  f(x);  it  assigns  to  every  value  x  in  X  a  value  Y.  We 

can  represent  this  knowledge  in  the  AR  framework  as  propositions  of  the  form  (V,  U)  is  F  where 

F  is  a  relationship  on  the  space  X  x  Y  such  that  F  =  LJ  {(x,  fix))}.  Thus  F  is  the  union  of  all 

xeX 

pairs  (x,  F(x))  which  are  solutions  to  U  =  fiV). 

A  special  type  of  functional  relationship  are  those  commonly  known  as  fuzzy  models  or 
fuzzy  graphs  [24],  We  denote  this  type  of  fuzzy  relationship  as  Vb  =  F(Va)  where  Va  and  Vb  are 

disjoint  variables  with  domains  X  and  Y.  In  this  case  instead  knowing  exact  pairs  of  points  we 
have  a  collection  of  fuzzy  solution  points,  Pi  and  the  relationship  is  the  disjunction  of  the  set 
points.  A  fuzzy  solution  point  Pj,  is  characterized  by  a  pair  <q-,  rj>,  qj  =  Va  is  Aj  and  rj  =  Vb  is 

B{,  and  is  defined  as  the  conjoin  of  the  components  of  the  pair,  Va  is  A[  x  Vb  is  Bi  and  thus  it  is  a 
proposition  of  form  (Va,  Vb)  is  Pj.  Since  the  relationship  between  these  fuzzy  solution  points  in  a 

disjunction  then  the  overall  function  relationship  F  =  Pi  or  P2  or . or  Pn.  More  formally  Vb  = 

F(Va)  is  represented  as 

F  =  <qi,  rf>  1  <q2,  r2>  1 . 1  <qn,  rn>. 
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This  results  in  a  proposition  of  the  form  (Va,  Vb)  is  P]  _L  P2  -L . 1  Pn 

Another  feature  of  the  AR  framework  is  the  ability  to  include  in  the  representation 
knowledge  about  the  certainty  or  more  generally  an  importance  weight  of  a  proposition.  This  is 
accomplished  through  the  process  of  certainty  (or  importance)  qualification.  A  statement 

V  is  A  is  a  certain 

can  be  expressed  as  the  proposition  V  is  B  where  B(x)  =  a  v  A(x).  Here  it  is  assumed  a  e  [0,  1], 
We  see  if  a  =  1  then  B  =  A  and  if  a  =  0  then  B(x)  =  X,  the  whole  space.  The  case  of  a  =  0  results 
in  the  proposition  expressing  no  knowledge. 


5.  Inference  and  Knowledge  Manipulation 

We  now  introduce  the  basic  rules  of  inference  available  in  AR  [25].  In  the  following  we 

use  the  notation  (Pi, . ,  Pn)  i— »  Q  to  indicate  that  we  can  infer  Q  from  the  collection  (Pi,  . , 

Pn) 

The  first  rule  of  inference  is  called  the  entailment  principle.  This  rule  says  that  if  Pi  and 
P2  are  two  propositions  such  that  Pi  cz  P2  then  from  Pi  we  can  infer  P2-  We  formally  express 

this  as  inference  rule  1 

IR-1:  (Pi)  h-»  P2ifPi  cP2 

One  important  application  is  the  following.  If  A  and  B  are  fuzzy  subsets  over  the  same 
universe  such  that  AcB  then  from  the  statement  V  is  A  we  can  infer  V  is  B. 

A  second  important  application  of  this  inference  rule  is  related  to  the  process  of  projection. 

Definition:  Let  Va  and  Vb  be  two  variables  with  domains  X  and  Z  respectively.  The  projection 
of  Va  is  A  on  Vb,  denoted  Projy^[Va  is  A]  is  defined  as  Vb  is  B  where  B(z)  =  MaxQ[A(x)]. 

Here  Q  is  the  subset  of  elements  in  the  domain  of  Va,  X,  which  have  the  value  as  z  for  the  atomic 
variables  which  Va  and  Vb  have  in  common7.  (If  Va  and  Vb  are  disjoint  then  B(z)  =  1  for  all  z). 

One  important  use  of  projection  is  the  case  where  V  =  (Vl,  ....,  Vn),  where  all  Vi  are 


7Assume  Vj,  XL  and  V3  are  atomic  variables  with  domains  Xj,  Xj  and  X3  and  let  Va  =  (V | ,  V2)  and  =  (V | , 
V3).  If  z  =  (x^,  X3)  then  Q  is  the  subset  of  Xj  x  X2  consisting  of  all  points  (xj,  y)  where  y  is  any  point  in  X2. 
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atomic  variables  and  we  desire  Projyj[V  is  A],  Let  us  first  consider  the  special  case  V  =  (Vl,  V2) 
and  we  desire  Projyj[V  is  A].  We  assume  domain  of  Vl  is  X  and  the  domain  of  V2  is  Y.  In  this 

case,  we  get  Vi  is  B  where  B(xi)  =  Maxyey[A(xi,  y)].  More  generally,  if  V  =  (Vl,  Vn)  and 
Xi  is  the  domain  of  Vi  then  Projy  |  [V  is  A]  =  Vl  is  B  where 

B(x)  =  Max(x2,  xn)  [A(x,  X2,  X3, xn) 

B(x)  is  the  maximal  membership  grade  in  A  of  any  tuple  containing  x. 

The  usefulness  of  the  projection  operator  lies  in  the  fact  [25]  that  if  Pi  =V  is  A  and 
P2  =  Va  is  P r oj y a [ V  is  A]  then  Pi  c:  P2-  Thus  using  the  entailment  principle,  IR-1,  given  V  is  A 

we  can  infer  Va  is  B  where  B  =  Projy  [V  is  A].  So  we  see  that  projection  allows  us  to  infer 

information  about  a  single  variable  from  information  about  joint  variables.  This  is  sometimes 
called  marginalization. 

Observation:  Assume  (Va,  Vb)  is  R  where  R  is  a  subset  of  X  x  Y  If  R  is  such  that  for  each 
x  e  X  there  exists  ay  eY  such  that  R(x,  y)  =  1  then  Projy  [V  is  A]  =  B  is  such  that  B(x)  =  1  for 

all  x.  Hence  all  we  infer  is  Va  is  X  which  is  essentially  no  knowledge. 

The  second  rule  of  inference  relates  to  the  combination  of  multiple  propositions.  Assume 
we  have  a  collection  propositions  Pi  for  i  =  1  to  q.  Since,  as  we  have  previously  indicated,  a 
proposition  Va  is  A  places  a  constraint  on  the  allowable  values  of  the  associated  variable,  the  total 
effect  of  a  collection  of  propositions  can  be  seen  to  be  the  conjunction  (anding)  of  the  individual 
propositions.  This  is  the  bases  if  the  second  inference  rule 
IR-2:  {Pi,  P2, .....  Pn)  •-»  Pi  x  P2  x . x  Pn 

We  are  now  in  a  position  to  understand  the  process  of  reasoning,  obtaining  inferences 
from  a  knowledge  base  consisting  of  the  collection  {Pi,  P2,  ....,  Pn)-  The  set  of  propositions  in 

our  knowledge  base  induces  a  combined  restriction  which  is  the  conjunction  of  all  propositions  in 
the  knowledge  base,  KB  sP[  xP2x . x  Pn.  Thus  our  knowledge  base  is  also  a  proposition,  V 

is  A,  which  is  the  conjunction  of  all  pieces  of  knowledge. 

The  use  of  IR-2,  the  conjunction  of  all  the  individual  pieces  of  knowledge  along  with  IR- 
2,  the  entailment  principle,  provides  the  bases  of  the  reasoning  mechanism 
Basic  Inference:  Proposition  P  is  inferable  from  a  KB  of  proposition  Pi,....,  Pn  denoted,  (Pi, ...., 
Pn)  1— »  P  if  Pi  x  P2  x . x  Pn  c=  P 
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It  is  important  to  initially  emphasize  that  the  inference  requires  that  we  must  use  the  whole 
knowledge  base.  However,  as  we  shall  subsequently  see,  in  some  cases  we  need  not  use  the 
whole  knowledge  to  make  inferences.  This  simplification  results  in  making  the  inference  process 
tractable  and  useful.  In  cases  in  which  we  must  use  the  whole  knowledge  base,  the  process  of 
reasoning  becomes  particularly  onerous  and  often  impractical. 

We  now  introduce  an  important  idea  related  to  the  concept  of  monotonicity. 

Definition:  We  say  that  a  proposition  Q  is  mono  tonic  if  for  any  proposition  P  =  V  is  A  it  is 
always  the  case  P  x  Q  c  P.  If  Q  doesn’t  satisfy  this  condition  we  call  it  a  non-monotonic 
proposition. 

All  the  propositions  that  we  have  introduced  so  far  are  mono  tonic.  As  we  shall 
subsequently  see  non-monotonicity  often  arises  in  the  representation  of  common  sense 
knowledge. 

In  the  following  we  shall  assume  all  the  propositions  in  the  knowledge  base  are 
monotonic,  we  have  a  monotonic  knowledge  base.  The  assumption  of  monotonicity  allows  for  a 
great  simplification  in  the  reasoning  process.  It  is  the  basis  of  the  type  of  deduction  that  is 
common  in  reasoning. 

Let  us  understand  the  great  significance  of  monotonicity.  The  assumption  of  monotonicity 
implies  that  if  (Pi,  ....,  Pn)  h->  P  then  (Pi,  ....,  Pn,  Pn+l)  i-»  P.  It  means  that  adding  any 

proposition  doesn’t  cause  us  to  have  to  withdraw  any  inference.  More  pragmatically  it  means  that 
if  we  infer  a  piece  of  knowledge  from  some  subset  of  the  knowledge  base  we  are  sure  it  is  valid 
for  the  whole  KB 

More  specifically  the  assumption  of  monotonicity  implies  the  following.  If  {Pi,....,  Pn}  is 

^  ^  A.  A 

our  knowledge  base  and  if  (Pi,  ...,  Pq}  and  {Pi, ....,  Pr}  are  two  subsets  of  the  KB  then. 

Ml.  If  (Pi,  ...,  Pq}  P  then  {Pi,....,  Pn}  P 

M2.  If  (Pi,  ..,  Pq}  h-»  P,  {Pi,  ...,  Pr}  h-»  P  and  if  (P,  P)  P  then  (Pi, ....,  Pn)  h->  P 

This  property  allows  us  to  use  deduction  and  take  advantage  of  various  types  of  local 

reasoning  of  patterns,  such  as  modus  ponens,  and  proceed  in  a  step  by  step  fashion  to  obtain 

complex  inferences.  This  situation  allows  for  a  localization  which  greatly  simplifies  the  inference 

process.  In  the  following  we  shall  refer  to  a  subset  of  our  knowledge  base  as  a  local  knowledge 
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base.  The  observation  of  the  preceding  is  that  we  can  make  valid  inferences  using  local 
knowledge  bases. 

6.  Protoforms 

Recently,  Zadeh  introduced  the  idea  of  protofonns  [10,  11,  26].  These  proto  forms  are 
local  reasoning  patterns  based  on  the  structure  of  the  component  propositions.  A  protofonn 
consists  of  a  collection  Pi,  ....,  Pq  of  premises  and  a  consequent  Q  and  is  of  the  nature  (Pi,  ...., 
Pq)  i— >  Q.  For  the  most  part  the  justification  of  these  protoforms  is  based  upon  application  of 
conjunction  (IR-1)  and  entailment  (IR-2).  The  importance  of  these  protoforms  is  that  if  we  can 
find  a  local  knowledge  base  (subset)  of  our  knowledge  base  consisting  of  propositions  whose 
structure  matches  the  premise  of  a  protofonn,  then  we  can  infer  the  consequent.  Furthennore, 
given  properties  Ml  and  M2  above,  these  proto  forms  can  be  independent  parts  of  a  complex 
deductive  reasoning  chain  in  monotonic  knowledge  bases. 

In  the  following  we  shall  identify  some  protoforms  available  in  AR.  Unless  otherwise 
indicated  we  assume  all  fuzzy  subsets  are  normal.  We  use  the  notation  PF-K  to  indicate 
protofonn  #K.  As  in  the  preceding  the  premises  are  identified  by  Pj  and  the  consequent  is 

denoted  Q. 

PF-1  provides  a  basic  protofonn  for  infening  information  about  a  component  variable 
from  a  joint  relation. 

PF-1  Projection  protoform 

Pi:(Vl,V2, Vq)  is  H 

Q:  Vi  AD  where  D  =  Projy^((Vl,  V2,  ....,  Vq)  is  H) 

We  recall  D(xi)  =  Maxy[H(xi,  y)  where  y  =  (x2, ....,  xq)  e  Y  and  Y  =  X2  x  X3  x  ...  x  Xq 
PF-2  is  a  fundamental  protofonn.  It  provides  the  basic  protofonn  for  making  inferences 
from  a  joint  relationship  and  companion  propositions  about  its  components.  It  essentially 
involves  a  conjunction  of  the  premises  followed  by  a  projection  onto  the  relevant  variable. 

PF  -2  Conjunction/Projection  Protoform 

Pp  Vl  is  Bi 


47 


P2:  V2  is  B2 
Pq-  Vq  is  Bq 

Pq+i:  (Vl,  V2, Vq)  is  H 

Q:  Vj  is  D  (D  =  ProjVj((Vl,  V2, .....  Vq)  is  E) 

Here  E(xj,  xq)  =  Min[Bi(xi)  a  B2  (x2)  a  ....  a  Bq(xn)  a  H(xp  ....,  xq)].  Note  if  any  of 
the  Pp  ....,  Pq  are  missing  these  still  holds  we  can  replace  Bj  by  Xj. 

The  following  protoforms  can  be  seen  as  important  special  cases  of  the  preceding. 

PF-3  Modus  Ponens 
P  | :  IF  V  is  A  then  U  is  F 
P2:  V  is  B 

Q:  U  is  D  where  D(y)  =  Poss[A|B]  v  F(y)  (Poss[A|B]  =  Maxx[A(x)  a  B(x)]) 

This  PF  is  a  special  case  of  PF  -2.  Here  Pi  is  represented  as  (V,  U)  is  H  and 
H(x,  y)  =  A(x)  v  F(y) 

E(x,  y)  =  (A(x)  a  B(x))  v  (F(y)  a  B(x)) 

D(y)  =  Maxx[E(x,  y)]  =  Poss[A|B]  v  F(y) 

Note  1:  If  B  is  subnonnal,  Maxx[B(x)]  =  b  then  D(y)  =  Poss[A|B]  v  (F(y)  a  b) 

Note  2:  Since  Cert  [A|B]  =  1  -  Poss[A|B]  then  we  can  express  D(y)  =  (1  -  Cert  [A|B])  v  F(y) 

PF-  4  Weighted  Modus  Ponens 

Pi :  If  V  A  A  then  U  is  F  is  a  certain 
P2:  V  is  B 

Q:  U  is  D  where  D(y)  =  Poss[A|B]  v  a  v  F(y) 

Here  Pi  becomes  (V,  U)  is  H  is  a  becomes  (V,  U)  is  G  where  G(x,  y)  =  H(x,  y)  v  a 
Note  3:  If  a  =  0  then  D(y)  =  1 

Note  4:  If  B  is  subnonnal,  Maxx[B(x)]  =  b,  the  D(y)  =  Poss[a|b]  v  (tv\  (a  v  F(y))) 

PF-5  Modus  Tollens 

Pi:  If  V  is  A  then  U  is  F 
P2:  U  is  C 

Q:  V  is  G  where  G(x)  =  A(x)  v  Poss[F|C] 

Here  E(x,  y)  =  (A(x)  v  F(y))  a  C(y)  =  (A(x)  a  C(y))  v  7(F(y)  a  C(y)) 
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G(x)  =  Maxy[E(x,  y)]  =  A(x)  v  Poss[F|C] 

Note  5:  If  Poss  [F  | C]  =  1  then  G(x)  =  1  for  all  x  and  we  infer  no  infonnation 
Note  6:  If  C  is  subnonnal,  Maxy(C(y))  =  c  then  G(x)  =  (A(x)  a  c)  v  Poss  (F|C)  <  c 

The  above  protofonns  were  essentially  special  cases  of  PF-2  we  now  turn  to  another 
special  case  of  this. 

PF-6  Extension  Principle 

Pi:  U  =  f(V) 

P2:  V  is  A 


Q:  U  is  f(A) 

In  this  case  U  =  f(V)  is  translated  as  (V,  U)  is  H  where  H(x,  y)  =  FJ 

xeX 


{ 


1 


(x,  f(x)) 


}  and  as 


in  the  preceding  G(x,  y)  =  H(x,  y)  a  A(x).  Here  again  D(y)  =  Maxx[H(x,  y)  a  A(x)]  however  in 
this  case  D(y)  =  Maxx[H(x,  y)  a  A(x)]  =  Max  [A(x)].  Hence  D  =  f(A) 

x  s.t.  f(x)  =  y 

Note:  More  generally  if  U  =  f(Vl,  Vq)  and  Vi  is  Ai  for  all  i  then  we  infer  U  is  f(Ai,  Aq) 
and  f(Ai, Aq)  =  D  where  D(y)  =  Max  [Aj(xi)a . A  Aq(xq)] 

(xi,  Xq)  S.t.  f(xi,  Xq)  =  y  H  H 


PF  -  6b  Inverse  Extension 

Pi:  U  =  f(V) 


P2:  U  is  B 


Q:  V  is  f-1(B) 

Again, U  =  f(V)  is  translated  as 


(V,  U)  is  H  where  H(x,  y)  =  FJ 

xeX 


{ 


1 


(x,  f(x)) 


}• 


However 


here  G(x,  y)  =  H(x,  y)  a  B(y)  and  D(x)  =  Maxy[H(x,  y)  a  B(y)].  In  this  case 
D(x)  =  Max  [B(y)] 

ys.t.  f1(y)  =  x 


PF  -  7  Fuzzy  Systems  Modeling 

F\:  Fuzzy  Systems  Model  (U,  V)  is  (A  )  n  Bj)  u  (A2  n  B2)  u . u  (An  n  Bn) 


P2:  U  is  E 

Q:  V  is  G  where  G(y)  =  Maxj[(Poss[Ai|E]  a  Bi(y))] 
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7.  Knowledge  Trees 


In  the  following  we  shall  describe  an  approach  to  question  answering  systems  using  the 
concept  of  a  knowledge  tree  [12,  13].  As  we  shall  see,  a  knowledge  tree  is  a  structure  generated 
from  a  knowledge  base  in  response  to  a  query.  It  can  be  seen  as  a  mobilization  of  the  knowledge 
base  to  address  a  particular  task.  At  a  meta  level  it  provides  a  plan  for  answering  the  question. 
We  shall  denote  our  knowledge  base  as  KB  and  initially  we  assume  it  consists  of  a  collection  of 
monotonic  propositions.  Associated  with  the  KB  is  a  collection  of  atomic  variables  Vl,  ....,  Vn 
and  their  associated  domains,  Xi,  ....,  Xn.  We  view  the  KB  as  consisting  of  two  classes  of 

propositions.  The  first  class  which  we  denote  as  D  consists  of  all  the  atomic  propositions, 
propositions  of  the  form  Vj  is  Aj  where  Aj  is  a  fuzzy  subset  of  the  universe  of  Vj.  Among  the 

propositions  in  D  are  the  atomic  propositions  Vi  is  Xi,  these  just  specify  the  domain  of  Vi.  The 
second  class  are  those  that  involve  joint  variables,  they  specify  relationships  between  variables. 
We  denote  the  collection  of  these  as  J, 

We  shall  concern  ourselves  here  specifically  with  questions  that  ask  about  the  value  of 
some  atomic  variable  Vi.  Our  procedure  for  answering  a  question  will  be  a  two  phase  process. 

The  first  phase  is  a  generation  of  a  knowledge  tree  in  response  to  the  posing  of  a  question.  The 
second  phase  is  the  contraction  of  the  tree  to  find  the  value  the  variable  queried  about.  The  first 
phase  can  be  seen  as  generating  the  direction  and  plan  for  answering  the  question 

Before  proceeding  let  us  clarify  the  tree  tenninology  we  shall  use.  A  node  na  will  be 
called  an  ancestor  of  the  node  nb  if  na  appears  anywhere  on  the  path  from  the  initial  node  of  the 
tree  to  nb-  In  this  case  %  is  called  a  decedent  of  na.  A  node  na  is  called  the  parent  of  node  nb  if 
nb  emanated  directly  from  na.  In  this  case  nb  is  called  the  child  of  na.  The  terms  immediate 
ancestor  and  immediate  descendent  are  equivalent  to  parent  and  child. 

The  following  is  the  basic  procedure  for  the  construction  of  the  knowledge  tree  in 
response  to  a  query  about  an  atomic  variable  V  in  the  case  where  the  KB  is  as  described  in  the 
preceding 

Basic  Tree  Generation  Algorithm 

1.  The  posing  of  a  question  in  the  form 
?  Vis 
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initiates  the  knowledge  tree  with  a  triangle  of  the  form 


2.  The  appearance,  anywhere  in  the  knowledge  tree,  of  a  triangle  labeled  with  a 
variable  not  having  itself  as  an  ancestor  causes  the  emanation  from  the  triangle  of  a 
family  immediate  successor  nodes: 

a.  One  corresponding  to  each  data  proposition  in  the  knowledge  base  subset  D 
involving  the  label  of  the  triangle.  A  successor  node  corresponding  to  an  element  from 
D  will  be  manifested  by  a  rectangle  labeled  by  the  name  of  the  data  proposition,  i.e. 
dp  V  is  A 


b.  One  corresponding  to  each  proposition  in  J  having  the  label  of  the  parent 
triangle  as  one  of  its  component  variables.  A  successor  node  corresponding  to  an 
element  from  J  is  manifested  by  a  circle  labeled  by  the  name  of  the  proposition, 


c.  If  there  are  no  appropriate  elements  from  either  a  or  b  this  causes  the  emanation 
of  a  rectangle  labeled  by  the  domain  of  the  variable,  I 


V  is  X 


3.  The  appearance  anywhere  in  the  knowledge  tree  of  a  triangle  labeled  with  a  variable 
having  itself  as  an  ancestor  causes  the  emanation  from  that  node  of  one  successor 
node,  a  rectangle  labeled  by  variables  domain. 


V  is  X 


4.  The  appearance  of  a  node  corresponding  to  an  element  from  J,  a  labeled  circle, 
causes  the  emanation  from  it  of  a  family  of  successor  triangle  nodes  each  labeled  by  a 
variable  appearing  in  the  proposition  denoted  by  the  circle 

5.  The  appearance  of  a  labeled  rectangle  tenninates  that  branch  of  the  tree. 

6.  The  construction  of  the  knowledge  tree  is  completed  when  all  branches  terminate  in 
rectangles. 

7.  The  completion  of  the  knowledge  tree  generation  phase  initiates  the  evaluation 
phase. 

In  figure  4  we  illustrate  a  typical  knowledge  tree. 
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((VI,  V2)  is  Bl) 


VI  is  A1 


(VI,  V3)  is  B2 


VI  is  XI  lV2  is  A2I  (vi,V2)isBl) 


VI  is  XI 


VI  is  XI  V2i sX2 


V3  is  A3  I  C  (VI,  V3)  is  B2 


Figure  4.  Typical  Knowledge  Tree 


VI  is  XI 


V3  is  X3 


Once  having  completed  the  construction  of  the  knowledge  tree  initiated  by  our  query  we 
must  next  apply  an  evaluation  procedure  to  calculate  the  value  of  the  desired  variable.  The 
evaluation  algorithm  is  based  upon  the  rules  of  approximate  reasoning  and  makes  considerable 
use  of  the  protoforms  introduced  earlier. 

Evaluation  Algorithm 

1.  Any  subtree  consisting  of  a  triangular  node  and  a  family  of  children  which  are  all 
rectangular  nodes  can  be  replaced  by  a  rectangular  node  whose  label  is  V  is  H, 


V  is  H 


Here  V  is  the  label  of  the  triangular  node  and  H  is  the  conjunction  of  the 


values  of  the  children  of  this  triangular  node.  (It  is  an  application  of  the  conjunction 
protoform) 

2.  Any  subtree  consisting  of  a  circular  node,  a  relational  proposition,  and  a  family  of 
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children  all  of  which  are  rectangular  nodes  can  be  replaced  by  a  rectangular  node 
whose  label  is  Vl  is  G  where  Vi  is  the  label  of  the  triangular  node  immediately 

preceding  the  circular  node.  Typically  this  is  accomplished  by  application  of  PF-2,  the 
Conjunction/Projection  Protoform.  However,  more  generally  this  may  involve  a  two 
step  process.  The  first  step  is  the  application  of  the  protoform  associated  with  the 
circular  node  and  its  children.  The  second  step  is  an  application  of  a  projection  onto  V 1 

of  the  proposition  resulting  from  step  1 . 

3.  The  evaluation  phase  terminates  when  there  exists  one  rectangular  node  whose  label 
is  the  desired  variable  with  its  associated  its  value  as  the  answer  to  the  question. 

The  knowledge  tree  fonns  a  useful  framework  for  formulating  the  question  answering 
process.  As  we  see  it  takes  a  knowledge  base  and  restructures  the  infonnation  in  it  to  address  the 
question  being  asked.  We  briefly  point  out  one  feature  of  the  knowledge  tree  approach  which, 
while  we  shall  not  currently  pursue,  is  a  very  useful  property.  In  particular  the  knowledge  tree 
can  point  us  in  a  direction  to  seek  additional  information  not  in  our  KB  that  may  help  us  in  getting 
a  good  answer.  A  simple  illustration  of  this  is  the  following.  Assume  we  are  interested  in  finding 
the  value  of  V  \ .  Assume  in  our  KB  we  have  a  relation  (V 1 ,  V2)  is  G.  This  relation  will  appear  as 
a  decedent  of  our  query  to  find  V] .  Let  us  further  assume  we  have  no  knowledge  of  V2-  Thus 

this  branch  will  terminate  with  a  rectangle  r  2  is  X2I  p-  as  a  rcsuq  0f  our  working  on  the  whole 
tree  we  get  an  answer  for  \V  ]  that  is  not  sufficiently  informative  for  the  user  we  see  that  this 

branch  provides  a  potential  path  for  improving  our  answer.  It  is  telling  us  if  you  are  able  to  get 
better  information  about  V2  you  can  possibly  improve  you  answer  for  V  p  At  meta  level  this  is 

essentially  pointing  in  some  direction  for  improving  the  quality. 

8.  Non-Monotonic  Possibilistic  Propositions 

In  order  to  take  full  advantage  of  the  wide  array  of  different  types  of  information  available 
we  must  extend  our  capability  to  allow  the  inclusion  in  the  knowledge  base  of  the  types  of 
commonsense  knowledge  that  play  a  substantial  role  in  human  reasoning  [6,  11].  Examples  of  this 
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are  "generally  it  is  difficult  to  find  parking  in  New  York  City,"  "rainy  weather  usually  results  in 
more  traffic  congestions"  and  "a  cup  of  coffee  costs  about  a  dollar."  This  commonsense 
knowledge  often  is  not  the  kind  of  categorical  knowledge  that  can  be  simply  represented  by  the 
type  of  AR  proposition  we  have  thus  far  considered.  It  is  often  tentative  and  context  dependent. 
This  means  it  can  often  be  involved  in  inferences  that  must  be  withdrawn  in  face  of  gaining 
additional  knowledge.  This  situation  is  formally  referred  to  as  non-monotonicity  [27,  28].  In  the 
preceding  we  assumed  all  the  knowledge  to  be  categorical,  of  a  monotonic  type,  this  allowed  us  to 
use  deductive  reasoning  techniques.  Here  we  shall  begin  to  consider  the  handling  of  types  of 
propositions  that  can  display  a  non-monotonicity. 

As  we  noted  non-monotonic  propositions  often  come  from  including  commonsense 
knowledge  [29].  One  important  class  of  these  propositions  are  of  a  default  type,  they  provide 
values  which  we  can  associate  with  a  variable  in  a  situations  in  which  our  other  available 
information  about  the  variable  doesn’t  provide  a  sufficiently  precise  value.  More  generally, 
commonsense  knowledge  is  frequently  used  to  help  in  situations  where  we  don't  have  enough 
direct  information  about  the  situation.  An  important  aspect  non-monotonic  propositions  is  that 
they  are  submissive  to  other  categorical  information.  For  example,  assume  that  we  are  interested 
in  John’s  age  and  we  know  that  he  is  in  high  school.  As  commonsense  knowledge  we  know  that 
typically  high  school  students  are  teenagers.  Using  this  piece  of  commonsense  knowledge  our 
best  guess  is  to  conclude  that  John  is  a  teenager.  However  assume  now  that  we  have  the 
additional  knowledge  that  John  is  in  his  thirties.  In  this  situation  we  want  to  the  default  based 
inference  that  John  is  a  teenager  to  defer  to  the  categorical  knowledge,  that  John  is  in  his  thirties. 
Implicit  in  this  situation  is  some  idea  of  a  prioritization  of  knowledge.  In  particular  the  default 
knowledge  has  a  lower  priority  then  categorical  knowledge. 

Another  use  of  default  propositions  are  as  a  simplifying  tool.  Here  they  are  used  to  allow 

us  to  make  inferences  without  requiring  us  to  obtain  some  antecedent  pieces  of  knowledge  with 

complete  certainty.  In  this  use  the  possibility  of  some  antecedent  being  true,  it  has  not  been 

falsified,  is  sufficient  grounds  for  making  an  inference.  This  use  often  acts  to  simplify  our 

inference  process.  An  illustrative  example  of  this  is  a  rule  about  crossing  streets:  "if  the  light  is 

red  and  the  driver  is  sober  then  assume  they  will  stop."  Often  when  using  this  we  can’t 
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realistically  ascertain  with  complete  certainty  that  a  driver  is  sober,  we  usually  settle  for  less 
certainty,  he  doesn't  manifest  any  drunkenness.  Here  we  can  cross  without  having  to  test  each 
driver. 

The  inclusion  of  non-monotonic  propositions  can  add  considerable  complexity  to  the 
reasoning  process.  This  complexity  results  from  the  fact  that  if  is  non-monotonic  and 

(P] ,  P2)  1 — ^  Q  it  may  be  the  case  that  using  the  whole  knowledge  base,  (Pi,  P2, . ,  Pn),  doesn’t 

justify  the  inferring  of  Q.  Thus,  with  use  of  non-monotonic  knowledge,  all  local  inferences  may 
not  be  valid.  A  deeper  understanding  of  the  situation  can  benefit  from  the  introduction  of  the 

concept  of  a  "tail"  associated  with  a  non-monotonic  proposition.  Let  P  =  (Pi,  P2, . ,  Pn)  be  our 

knowledge  base  and  let  P*  be  a  proposition  in  P.  The  tail  Tp*  associated  with  P*  is  a  subset  of  P 

consisting  of  the  minimal  number  of  propositions  such  that  any  proposition  inferred  from  P* 

O 

along  with  its  tail  is  also  inferable  from  the  whole  knowledge  base  .  The  point  being  that 
protoforms  involving  non-monotonic  knowledge  must  include  all  members  of  its  tail9.  This  can 
lead  to  very  complex  protoforms  that  may  involve  many  propositions  and  is  contexturally 
dependent  upon  what  propositions  exist  in  the  KB.  We  observe  that  categorical  propositions  have 
empty  tails. 

An  important  class  of  commonsense  knowledge  can  be  represented  by  propositions  that 
have  a  possibilistic  component.  As  we  shall  subsequently  see  this  allows  us  to  make  inferences 
using  the  fact  that  some  antecedent  is  possibly  true  rather  the  certainly  true.  This  softening  of  the 
requirements  brings  with  it  the  potential  that  we  are  wrong  and  hence  introduces  non¬ 
monotonicity.  In  the  following  we  shall  look  at  some  basic  protoforms  associated  with 
propositions  having  a  possibilistic  component. 

In  order  to  get  an  understanding  of  an  paradigm  that  plays  a  fundamental  role  in  many 
protofonns  associated  with  possibilistic  non-monotonic  reasoning  patterns  we  review  the  classical 

8We  note  that  this  is  true  for  any  subset  that  contains  the  tail ,  without  introducing  any  other  non-monotonic  propositions  not  in 
the  tail. 

9 

Here  we  are  assuming  a  constant  knowledge  base.  In  dynamic  knowledge  bases  the  addition  of  new  knowledge  can  be  seen  as 
opening  the  possibility  that  the  original  tail  used  in  making  an  inference  has  been  increased  as  a  result  of  getting  new  information. 
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modus  ponens  protoform.  In  modus  ponens  we  have  (Pi,  P2)  !->  Q  where 


PI :  If  V  is  A  then  U  is  F 
P2:  V  is  B 

Q:  U  is  D  where  D(y)  =  (1  -  Cert[A|B])  v  F(y) 

In  this  protoform  we  see  require  that  we  are  certain  that  V  is  A  to  infer  U  is  F 
Note:  It  is  interesting  to  note  that  we  can  view  U  is  D  as  a  kind  of  certainty  weighted  proposition: 
U  is  F  is  a  certain 

where  a  =  Cert[A|B]  =  1  -  Poss[A|B] 

We  now  consider  a  related  non-mono  tonic  protoform 
NMPF-1:  Basic  Non-Monotonic  Possibilistic  Protoform:  (P*,  P2)  1— »  Q  where10 

P*:  If  V  is  A  is  possible  then  U  is  F 
P2:  V  is  B 

Q:  U  is  E  where  E(y)  =  (1  -  Poss[A|B])  v  F(y) 

Here  we  see  that  we  infer  F  if  A  is  possible  given  we  have  B.  Note  in  the  usual  modus  ponens  we 
require  A  is  certain  given  we  have  B.  Since  Poss[A|B]  >  CertfAjB]11,  then  this  is  easier  to 

satisfy. 

A  simple  example  illustrates  this.  Let  V  have  domain  X  =  {a,  b,  c,  d}.  Let  A  =  {a}  and  let 
B  =  {a,  b}.  In  the  case  of  models  ponens  we  get  D(y)  =  (1  -  Cert[A|B])  v  F(y).  Since 
Cert[A|B]  =  1  -  Poss[A|B]  =  1  -  Maxx[A(x)  a  B(x)]  =  0  then  D(y)  =  (1  -  Cert[A|B])  v  F(y)  =  1 
and  we  can't  infer  anything.  In  the  case  of  P*  we  get  E(y)  =  (1  -  Poss[A|B]  v  F(y).  Since 
Poss[A|B]  =  Maxx[A(x)  a  B(x)]  =  1  then  E(y)  =  F(y).  Thus  here  we  infer  U  is  F 

However  we  also  can  easily  illustrate  the  non-monotonicity  associated  with  the  use  of  P*. 
Assume  that  we  get  the  additional  piece  of  information  P3:  V  is  G  where  G  =  {b,  c}.  Here  now 
the  tail  associated  with  P*  consists  of  both  P2  and  P3.  In  this  case  we  must  use  (P*,  P2,  P3)  and 
our  knowledge  about  V  is  that  V  is  H  where  H  =  G  n  B  =  {b } .  In  this  case  Poss[A|H]  =  0,  hence 


10 


We  are  implicitly  assuming  here  that  P7  is  the  tail  of  P* 


1  1 

This  holds  for  normal  sets 
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E(y)  =  1  -  0  v  F(y)  =  1,  E  =  X,  and  thus  we  can't  infer  anything. 

A  slightly  more  general  version  of  the  above  protoform  can  be  described.  Let  P*  be  as  in 
the  preceding,  If  V  is  A  is  possible  then  U  is  F.  Let  Tp*  be  the  tail  of  P*.  Assume 
Tp*  i-»  V  is  H  then  (P*,  Tp*)  U  is  E  where  E(y)  =  (1  -  Poss[A|H]  v  F(y).  Let  us  formally 

denote  this  protoform. 

NMPF-lb:  General  Non-Monotonic  Possibilistic  Protoform:  (P*,  Tp*)  i— >  Q  where 

P*:  If  V  is  A  is  possible  then  U  is  F 
Tp*:  h->  V  is  H 

Q:  U  is  E  where  E(y)  =  (1  -  Poss[A|H])  v  F(y) 

An  issue  here  is  the  determination  of  the  tail  of  P*.  In  this  case,  it  is  easily  obtained  by  querying 
the  knowledge  base  for  the  value  of  V.  Thus,  in  this  case  the  appearance  of  P*  in  a  knowledge 
tree  would  generate  a  triangle  labeled  by  V.  This  triangle  would  in  turn  generate  a  whole  subtree 
which  essentially  constitutes  Tp*.  One  fundamental  difference  between  this  situation  with  P*  and 

that  of  the  usual  modus  ponens  with  PI  is  the  following.  While  the  appearance  of  either  of  them 
in  a  knowledge  tree  will  generate  a  trailing  subtree  initiated  by  a  triangle  labeled  with  V,  the 
modus  ponens  situation  will  allow  us  to  prune  this  subtree  and  still  obtain  a  resulting  inference  for 
U  that  is  a  valid  inference  while  the  pruning  of  the  subtree  in  the  case  of  P*  can  lead  to  invalid 
inferences. 

An  important  class  of  non-monotonic  possibilistic  protoforms  relates  to  reasoning  with 
rules  that  supply  default  or  typical  values.  In  reasoning  with  these  types  of  propositions  we  must 
often  distinguish  between  what  we  know  about  a  variable  before  we  apply  the  proposition  and 
value  of  the  variable  after  we  apply  the  protoform.  Here  our  prior  knowledge  about  the  variable  is 
important  in  that  it  effects  how  a  protoform  works.  This  knowledge  is  generally  obtained  from 
propositions  with  a  higher  priority  and  a  stronger  certainty,)  then  the  protoform  being 
implemented.  The  following  protoform  relates  to  the  reasoning  with  default  or  typical  values 
NMPF-2:  Typical  Value  Protoform 
P*:  Typically  (V  is  A) 

P2:  V  is  B  (Prior  Knowledge) 

Q:  V  is  D  where  D(x)  =  B(x)  a  (A(x)  v  [1  -  Poss(A|B))] 
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In  this  case  we  represent  the  knowledge  typically  (V  is  A)  as  the  possibilistic  proposition. 

If  V  is  A  is  possible  (with  what  we  already  know  about  V)  then  V  is  A 
Here  what  we  already  know  is  that  V  is  B.  We  see  that  we  are  taking  as  D  the  conjunction  of  B 
and  A  is  credible  given  B  this  gives  us 

D  =  B  n  (A  is  credible  given  B) 

A  slightly  more  general  version  of  the  above  protoform  can  be  described.  Let  P*  be  as  in 
the  preceding,  Typically  (V  is  A)  Let  Tp*  be  the  tail  of  P*.  Assume  Tp*  i— »  V  is  H  then 
(P*,  Tp*)  I— >  V  is  E  where  E(x)  =  H(x)  A  (A(x)  v  (1  -  Poss(A|H))).  We  shall  formally  denote 

this  protoform  as  NMPF-2b:  Typical  Value  Protoform 

Another  closely  related  protoform  is 
NMPF-3:  General  Typical  Value  Protoform 

P*:  If  V  is  C  and  U  is  A  is  possible  then  U  is  A 

P2:  V  is  G 

Tp*:  i->  U  is  B 

P3:  U*  is  B 

Q:  U  is  D  where  D(y)  =  B(x)  a  (A(x)  v  (1  -  Cert[C|G])  v  (1  -  Poss[A|B]) 

We  now  turn  to  the  formulation  of  knowledge  bases  having  default  or  other  types  of  non¬ 
monotonic  knowledge.  As  we  already  noticed  when  working  with  these  types  of  proposition,  the 
idea  of  priority  of  proposition  plays  an  important  role.  This  prioritization  of  propositions  must  be 
incorporated  in  the  knowledge  base.  We  note  that  the  ordinary  categorical  type  knowledge  has 
the  highest  priority. 

In  constructing  our  knowledge  base  we  shall  add  an  additional  class  to  J  and  D  consisting 
of  our  possibilistic  non-monotonic  propositions.  We  shall  denote  this  category  as  N. 
Furthermore  we  shall  assume  that  the  propositions  in  N  are  ordered  with  respect  to  their  priority, 
there  is  a  hierarchy  of  these  propositions.  To  avoid  unnecessary  complexities  at  this  junction  we 
assume  this  is  a  linear  ordering.  Thus  N  is  a  collection  of  non-monotonic  propositions  where 

N  =  Nq > Nq_i  > . >  N2  >  N). 

Here  the  bigger  index  then  the  higher  the  priority.  This  hierarchy  is  a  reflection  of  the  fact 


that  the  higher  priority,  the  sooner  we  want  to  consider  that  information  in  the  reasoning  process. 

58 


Thus,  if  Nj  and  Nj  are  such  that  i  >  j  then  we  want  to  use  Nj  before  we  use  Nj.  The  categorical 
propositions  in  D  and  J  are  assumed  of  higher  priority  then  any  element  in  N  and  as  such  should 
be  considered  before  any  element  in  N.  We  note  prior  consideration  in  the  tree  evaluation  process 
means  an  appearance  lower  in  the  knowledge  tree. 

As  is  the  case  with  the  elements  in  J  and  D  the  propositions  in  N  have  associated 
variables  and  generally  function  by  enforcing  constraints  on  these  variables. 

The  inclusion  of  the  component  N  in  our  knowledge  base  effects  the  process  of  generating 


the  knowledge  tree  described  earlier.  In  the  following  we  shall  use  a  labeled  trapezoid. 


N, 


to  indicate  the  proposition  Nj^  in  N  . 

We  now  provide  a  modified  version  of  step  2  in  the  tree  generation  process  to  account  for 
the  inclusion  of  non-monotonic  elements  in  N. 

The  appearance  in  the  knowledge  tree  of  a  triangle  labeled  with  a  variable,  /v\ ,  not 


having  itself  as  an  ancestor  causes  the  emanation  from  the  triangle  of  a  tail  consisting  of . 

i.  A  serial  emanation  of  labeled  trapezoids  corresponding  to  all  propositions  in  N 

having  V  as  one  of  its  components.  This  series  begins  with  the  lowest  priority  proposition  and 

proceeds  in  increasing  order  of  priority.  Furthermore,  emanating  from  each  trapezoid  is  a  special 
/\ 

1 2 

dashed  triangle  ,  /  V  \  ,  labeled  by  V  as  well  as  a  regular  triangle  labeled  by  any  other  variable 
associated  with  the  preceding  trapezoid. 

ii.  The  end  of  the  chain  of  trapezoids  causes  the  emanation  from  its  trailing  labeled  dashed 
triangle  of  a  parallel  family  of  immediate  successor  nodes. 

(a) .  One  corresponding  to  each  proposition  in  the  knowledge  base  subset  D  involving 
the  label  of  the  dashed  triangle.  These  are  labeled  rectangles. 

(b) .  One  corresponding  to  each  proposition  in  J  having  the  label  of  the  parent  dashed 
triangle  as  one  of  its  component  variables.  These  are  labeled  rectangles  as  described  earlier. 
Finally  if  there  are  no  appropriate  elements  from  either  a  or  b  this  causes  the  emanation  of  a 


12 

We  use  a  dashed  triangle  so  as  to  indicate  that  this  is  not  a  new  appearance  of  the  variable  but  part  of  the  current 
tail. 


59 


rectangle  labeled  by  the  domain 
Figure  5  we  provide  an 
this  revised  method 


of  the  variable, 


V  is  X 


illustration  of  a  sub-branch  of  a  knowledge  tree  generated  using 


/V  \ 


Figure  5.  Subtree  from  Non-monotonic  Propositions 

The  evaluation  step  is  only  modified  by  an  instruction  for  handling  trapezoidal  nodes.  We 
add  the  following  item. 

3.  Any  subtree  consisting  of  a  trapezoidal  node  and  a  family  of  children  which  are  all 
rectangular  nodes,  can  be  replaced  by  a  rectangular  node  whose  label  is  determined  by  the 
preceding  triangle  and  whose  value  is  detennined  by  the  protofonn  associated  with  the 
proposition  denoted  by  trapezoid. 


9.  Defuzzification 

Let  V  be  a  variable  taking  its  value  in  the  domain  X.  In  the  preceding  we  considered  the 
problem  of  finding  the  value  of  this  variable  from  a  knowledge  base.  In  answering  this  question 
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we  allowed  our  answer  to  be  a  fuzzy  subset.  More  generally  we  allowed  for  an  answer  that 
contains  some  uncertainty.  In  some  cases  we  may  be  required  to  provide  an  answer  that  is  exactly 
one  value  from  the  set  X.  An  example  of  this  occurs  in  the  widely  used  technology  of  fuzzy 
systems  modeling  [24,  30].  If  the  available  knowledge  about  V  allows  us  only  to  infer  that 
V  is  A,  where  is  a  fuzzy  subset  we  must  provide  a  process  to  obtain  a  single  element  from  X  using 
this  infonnation.  Here  shall  refer  to  this  step  as  defuzzification,  Defuzz.  Other  closely  related 
tenns  for  this  process  are  choice,  selection  and  decision. 

The  process  of  defuzzification  has  been  studied  in  the  literature  [31]  and  there  exist  a  large 
number  of  methods  for  implementing  this  operation.  In  its  most  general  sense  the  protoform 
associated  defuzzification  (choice)  is  an  example  of  a  non-monotonic  operation.  Let  us  look  at 
this  operation.  In  the  following  we  provide  a  basic  protoform  of  this  operation 
Protoform:  For  Defuzz(V  is  A) 

PI:  V  is  A 

Q:  V  =  xq  where  xq  =  MaxxeX[A(x)]. 

We  note  other  forms  of  Defuzz  are  possible.  For  example  if  X  is  the  real  line  then  Filev 

and  Yager  [32]  defined  the  general  BADD  defuzzification  of  Vis  A  as  V  =  a  where  a  = 

A(x:)r 

x: - - - with  r  >  0.  We  see  if  r  — »  oc  we  get  the  Max  operator. 

xj  A(xk)r 

Xk 


Let  us  see  the  non-mono  tonicity  implicit  in  this  operation.  Assume  we  infer  V  is  A  from 
our  current  knowledge  base.  Using  defuzzification  we  get  DEFUZZ(V  is  A)  =  a.  Assume 
we  get  more  knowledge  and  subsequently  obtain  V  is  A.  In  this  case  we  get 
DEFUZZ(V  is  A)  =  a.  For  the  defuzzification  process  to  be  monotonic  we  must  have  a  =  a,  a 
situation  which  can  not  be  guaranteed.  At  a  meta  level,  the  reason  for  the  non-mono  tonicity 
associated  with  the  defuzzification  operation  is  that  in  choosing  a  precise  element  we  are 
assuming  information,  above  and  beyond  what  we  have  a  right  to  do.  We  are  essentially  using  an 
operation  related  to  the  principle  of  maximal  entropy  which  is  non-mono  tonic. 

In  using  our  knowledge  tree  approach  we  can  include  these  types  of  exact  questions  in  the 
following  manner.  We  initiate  our  knowledge  tree  with  a  special  a  triangle  /V=\  •  The 
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occurrence  of  this  equal  triangle  causes  the  emanation  of  the  Defuzz  protoform  as  shown  in 
figure  6.  The  appearance  of  the  ordinary  triangle  induces  a  process  of  finding  V. 


Figure  6.  Defuzzification  sub  tree 


10.  Dempster-Shafer  Granules  for  Knowledge  Representation 

Dempster-Shafer  belief  structures  [33-35]  provide  a  useful  class  of  knowledge 
representation  tools.  For  our  purposes  of  representing  and  manipulating  knowledge  within 
question  answering  systems  we  shall  find  it  convenient  to  use  a  closely  related  framework  that  we 
shall  refer  to  as  Dempster-Shafer  granules,  or  simply  D-S  granules.  A  D-S  granulate  consists  of 
two  components.  The  first  is  a  collection  of  n  categorical  proposition  of  the  type  previously 
discussed: 

gi:  V  is  A],  g2:  V  is  A2, . ,  gn:  V  is  An 

Here  V  is  a  joint  variable  and  A{  is  a  fuzzy  subset  on  its  domain  X.  We  shall  refer  to  this 
collection  as  the  body  of  the  granule.  Thus  each  component  in  the  body  has  a  structure  of  a 
proposition  from  AR. 

In  addition,  associated  with  a  D-S  granule  is  a  mapping  m  that  associates  with  each 

n 

proposition  gj  in  the  body,  a  value  mi  such  that:  1.  mi  e  [0,  1]  and  2.  rrij  =  1.  We  refer  to 

i-  1 

this  as  the  qualifier  of  the  granule. 

A  very  special  example  of  a  D-S  granule  is  one  in  which  n  =  1.  In  this  case  the  body 
consists  of  a  single  proposition  g] :  V  is  A  and  m  ]  =  1.  This  is  the  same  as  an  ordinary 

proposition. 

A  closely  related  and  slightly  more  general  structure  is  one  in  which  we  allow  the  values 
mj  to  be  fuzzy  numbers  mp  These  are  closely  related  to  what  Zadeh  calls  Possibility- 
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Probability  granules.  [26] 

In  the  following  we  shall  describe  some  of  the  types  of  information  that  can  be  represented 
by  D-S  granules.  One  situation  is  when  our  knowledge  about  a  variable  has  a  random  component. 
For  example,  if  V  is  an  atomic  variable  taking  values  in  X  =  (xi,  ....,  xn}  and  we  know  that 
Prob(xj)  =  pj,  we  can  represent  it  as  a  D-S  granule  with  components  gj:  V  is{x •}  and  mj  =  pj 

More  imprecise  knowledge  such  as  that  the  probability  of  subset  B  of  X  is  at  least  a  can 
be  expressed  using  a  D-S  granule  where  gl :  V  is  B  &  raj  =  a,  g2 :  V  A  X  &  m2  =  1  -  a. 

The  case  where  we  know  that  the  probability  of  B  is  exactly  a  can  be  represented  with  the 
D-S  granule  gl :  Vis  B  &  mi  =  a,  g2:  V  is  B  &  m2  =  1  -  a. 

Quantified  propositions  can  be  expressed  using  D-S  structures.  A  simple  example  of  this 
is  "Most  high  income  Americans  live  in  urban  areas."  Let  us  see  how  D-S  granules  help  us 
represent  this  type  of  knowledge.  Let  Vz  and  Uz  be  joint  variables  and  let  Q  be  a  non-decreasing 

type  quantifier  such  as  at  least  a.  Consider  the  proposition 
For  Qze  Z  if  Vz  is  A  then  Uz  is  B. 

(For  most  people  if  Income  is  HIGH  then  Residence  is  URBAN  AREA) 

We  can  represent  this  by  a  collection  of  D-S  granules  such  that  for  each  zeZwe  have 
gl:  If  Vz  is  A  thenUz  is  B  mi  =  a 

g2-  If  Vz  is  A  then  Uz  is  X  m2  =  1  -  a 

here  X  is  the  domain  of  U 

The  meta  protoform  for  manipulating  D-S  granules  makes  considerable  use  of  the  fact  that 
the  body  of  the  D-S  granule  is  made  up  of  proposition  of  the  type  we  have  already  dealt  with. 

Assume  Ml  is  a  D-S  granule  with  body  consisting  of  propositions  gl  for  i  =  1  to  n  and 
associated  weights  mi.  Let  M2  be  another  independent  D-S  granule  with  body  consisting 
propositions  gj  for  j  =  1  to  n  and  associated  weights  mj.  The  protoform  for  combining  these  two 
D-S  granules  denoted  PF(Mi,  M2)  is 
PI:  Mi 
P2:  M2 
Q:  M 

Here  M  is  a  D-S  granule  whose  body  consists  of  the  nn  propositions  hjj  =  PF[gi,  gj]  and  PF[gi,  gj] 
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is  the  proto  form  associated  with  the  components.  In  addition,  associated  with  each  object  hjj  in  M 
is  a  weight  mi  mj.  We  emphasize  the  autonomy  of  the  operations  performed  on  the  components 
of  the  body  of  the  granules,  the  gi  and  gj,  from  the  operations  preformed  on  the  qualification 
weights,  the  mi  and  mj,  we  refer  to  this  as  Body-Qualification  autonomy,  B-Q  autonomy. 

We  illustrate  this  protoform  with  a  simple  example  of  modus  ponens.  Assume  the  granule 
Mi  is  made  up  of  g j :  V  is  Bj  with  m]  =  a,  and  g2'-  V  is  B2  with  m2  =1  -  a.  Let  M2  be  made  up 

of  §1 :  If  V  is  A\  then  U  is  F  with  irq  =  P  and  g2'  If  V  is  A}  then  U  is  X  with  m2  =  1  -  P 

Here  the  internal  operation  PF[gi,  gj]  is  simply  modus  ponens.  We  recall  that  this 
protofonn  is  PI:  V  A  B,  P2:  if  V  is  A  then  U  is  F  =>  U  is  D  where  D(y)  =  Poss[A|B]  v  F(y) 

Using  this  on  Mj  and  M2  we  get  M  such  that 
hi  1:  Du  =Poss  [A  1 1 B 1  ]  vF(y)  with  weight  Pa 
h21  D21  =  Poss  [Ai|B2]  vF(y)  with  weight  (1  -  a)P 
hi2  D12  =  X  with  weight  1  -  p 

The  process  of  reasoning  with  tree  like  structures  that  contain  knowledge  represented  by 
D-S  granules  can  get  complex  and  raises  some  very  fundamental  issues  notable  among  these  are 
those  related  to  independence  and  the  lack  of  idempotency  when  using  the  same  knowledge  more 
then  once  in  a  tree.  Rather  then  superficially  touching  on  this  very  complex  and  important  topic 
we  refer  the  reader  to  the  work  of  Shafer,  Shenoy  and  their  colleagues  [36,  37]  on  reasoning  with 
D-S  granules  in  tree  like  structures. 
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PART  III.  UNCERTAINTY  MANAGEMENT  FOR  INTELLIGENCE 
ANALYSIS 


1.  Introduction 

In  his  report  to  Congress  [1],  David  A.  Kay,  who  led  the  US  government's  efforts  to  find 
evidence  of  Iraq's  illicit  weapons  programs,  reported  that  the  current  intelligence  systems  dealing 
with  weapons  of  mass  destruction  are  increasingly  based  on  limited  information.  In  light  of  this 
situation,  he  indicated  that  modem  intelligence  analysis  systems  need  a  way  for  an  analyst  to  say, 
"I  don't  have  enough  information  to  a  make  a  judgment,"  a  capacity  that  he  felt  the  current 
intelligence  systems  do  not  possess.  Central  to  attaining  this  capability  is  the  ability  to  deal  with 
uncertain  and  imprecise  information.  We  believe  that  fuzzy  logic  with  its  focus  on  uncertainty 
can  help.  It  has  the  ability  to  simultaneously  exploit  both  precise  fonnal  measurements  of  the 
type  obtained  from  state  of  the  art  electronic  and  mechanical  monitoring  devices  as  well  the  type 
of  imprecise  information  obtained  from  human  sources  which  is  often  perception  based  and 
expressed  in  linguistic  terms.  Here,  we  begin  to  look  at  the  possibilities  of  using  fuzzy  logic  [2] 
and  related  soft  computing  technologies  to  provide  the  tools  necessary  to  supply  this  capability  to 
intelligence  analysts.  As  we  shall  subsequently  see,  the  dual  measures  of  possibility  and  certainty 
[3]  provide  a  useful  way  of  formalizing  the  concept  of  not  knowing  with  certainty. 


2.  Variables  and  Question  Answering 

By  a  variable  we  shall  mean  an  attribute  associated  with  some  specific  object.  Thus,  if  V 

is  a  variable  then  V  =  attribute  (object).  John's  age  and  the  number  of  nuclear  devices  possessed 

by  North  Korea  are  examples  of  variables.  In  the  first  case,  the  attribute  is  age  and  the  object  is 

John.  In  the  second  case,  the  attribute  is  the  quantity  of  nuclear  devices  and  the  object  is  North 

Korea.  Typically  with  a  variable,  we  assume  it  has  a  domain,  X,  consisting  of  the  set  of  possible 

values.  In  many  situations  a  task  of  great  interest  is  the  answering  of  some  question  about  a 

variable.  For  example,  is  John  over  65?  Another  closely  related  task  is  that  of  making  a  decision 

in  which  knowledge  about  a  variable  is  central  to  the  decision.  For  example,  a  bartender  deciding 
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whether  to  serve  John  a  drink  must  ascertain  that  his  age  is  at  least  2 1 . 

We  emphasize  the  distinction  between  the  task  of  finding  the  value  of  a  variable  and  that 
of  answering  a  question  about  a  variable.  Clearly,  although  knowing  the  value  of  a  variable  can 
help  in  answering  a  question,  it  is  not  always  necessary.  That  is,  there  can  to  some  uncertainty 
and  still  we  can  answer  a  question  about  a  variable  with  certainty. 

In  order  to  be  able  to  answer  a  question  about  the  value  of  a  variable,  we  must  draw  upon 
all  our  sources  of  information  about  the  variable.  Figure  7  illustrates  this  situation.  The 
information  provided  by  these  sources  may  be  related  to  the  variable  of  interest  in  a  number  of 
different  ways.  It  may  be  information  directly  about  the  value  of  the  variable  of  interest,  an 
observation  on  the  age  of  John.  An  example  of  this  is  a  birth  certificate.  It  may  be  about  the 
attribute  without  specific  reference  to  John.  Human  beings  typically  live  no  more  than  about  85 
years.  It  may  be  information  about  the  value  of  another  attribute  associated  with  John,  "the  color 
of  John's  hair  is  grey."  It  may  be  infonnation  relating  the  variable  of  interest  to  other  attributes  or 
variables,  "John  is  five  years  younger  than  Mary."  Furthennore,  each  of  these  pieces  of 
information  may  have  different  degrees  of  credibility.  In  addition,  the  information  from  the 
sources  may  be  obtained  from  precise  measurement  or  may  be  based  upon  perceptions  and 
observations.  It  may  be  expressed  formally  or  in  linguistic  terms. 


Question 


PROCESSING 


Answer 


i 


k 


Relevant  Information 

Figure  7.  Task  of  answering  a  question 

The  process  of  answering  the  question  about  the  attribute  involves  combining  this 
information.  In  some  cases,  this  process  may  involve  a  fusing  of  the  available  information  to 
obtain  an  effective  value  for  the  variable.  The  answer  to  the  question  of  interest  is  obtained  with 
respect  this  fused  information.  In  some  cases,  the  answer  to  the  question  may  be  obtained  using  a 
process  that  doesn't  depend  upon  obtaining  an  effective  value. 
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3.  Basic  Knowledge  Representation  Using  Fuzzy  Sets 

Among  the  central  tasks  involved  in  providing  an  answer  to  a  question  is  the 

representation  of  the  relevant  infonnation  in  a  manner  that  allows  formal  manipulation.  The 

representational  language  should  be  rich  enough  to  allow  the  modeling  of  different  types  of 

information.  Fuzzy  subsets  provide  the  basis  for  a  very  expressive  framework  for  the 

representation  of  a  wide  body  of  knowledge.  This  knowledge  can  be  either  precise  or  imprecise. 

It  can  be  used  to  represent  knowledge  expressed  using  linguistic  values.  Here,  we  shall  briefly 

discuss  this  representational  capability,  however,  we  note  the  extensive  literature  on  this  subject 

and  specifically  refer  the  reader  to  the  work  of  Zadeh  under  his  paradigm  of  computing  with 

words  [4,  5]  and  the  related  theory  of  approximate  reasoning  [6-10], 

Within  the  framework  provided  by  fuzzy  sets  knowledge  about  the  value  of  a  variable  V  is 

expressed  using  a  statement  V  is  A  where  A  is  a  fuzzy  subset  of  the  domain  X.  The  use  of  this 

type  of  representation  can  be  seen  as  a  generalization  of  the  idea  of  imposing  a  constraint  on  the 

value  of  V,  such  as  saying  that  V  lies  in  the  subset  B,  when  B  is  a  crisp  subset  of  X.  An  example 

of  this  is  saying  John’s  age  is  between  25  and  35.  The  use  of  fuzzy  subsets  allows  for  a  grading  of 

this  concept  of  V  lying  in  the  set  B.  Hence,  the  statement  V  is  A  manifests  a  constraint  on  the 

value  of  the  variable  V.  The  assignment  of  a  fuzzy  subset  A  to  the  variable  induces  a  possibility 

distribution  on  X  such  that  A(x)  indicates  the  possibility  that  x  is  the  value  of  V. 

These  types  of  fuzzy  assignments  can  arise  in,  although  are  not  restricted  to,  situations  in 

which  the  information  about  the  value  of  the  variable  is  initially  expressed  in  linguistic  terms.  An 

example  of  this  would  be  the  observation  that  John  is  middle  aged.  In  this  case,  the  fuzzy  subset 

is  the  representation  of  the  linguistic  term  middle-age.  Here,  the  definition  of  the  fuzzy  subset  A 

is  such  that  for  x  e  X  the  membership  grade  A(x)  is  the  compatibility  of  the  age  x  with  the 

concept  being  represented,  middle-age.  We  should  note  that  while  the  use  of  a  crisp  subset  allows 

for  a  representation  of  uncertainty  of  value  fuzzy  subsets  allows  for  a  more  sophisticated 

representation.  They  allow  for  more  than  just  a  simple  distinction  between  those  values  that  are 

possible  and  those  that  are  impossible  -  it  allows  a  grading  of  possibility 

We  note  that,  in  the  case  where  A  =  {x},  then  the  statement  V  is  A  is  equivalent  to  saying 

that  V  =  x.  Another  special  case  is  when  A  =  X.  Here,  the  statement  V  is  X  is  equivalent  to 
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saying  that  we  don’t  know.  If  B  is  some  crisp  subset  of  X,  then  the  statement  V  is  B  is  equivalent 
to  saying  the  value  of  V  lies  in  B.  The  situation  when  A  =  0  ,  the  null  set,  corresponds  to  the  case 
where  we  are  saying  our  knowledge  is  that  V  is  not  in  X.  This  situation  indicates  a  complete 
conflict  with  our  assumption  that  V  must  take  its  value  in  X.  More  generally,  if  A  is  such  that 
MaxxA(x)  <  1  then  we  have  some  degree  of  conflict  with  the  assumption  that  V  has  X  as  its 

domain.  We  shall  say  a  fuzzy  subset  is  nonnal  if  there  exists  at  least  one  x  e  X  so  that  A(x)  =  1. 
If  MaxxA(x)  <  1  we  say  A  is  subnonnal. 

Consider  the  situation  where  we  have  the  knowledge  that  V  lies  in  B,  V  is  B,  where  B  is 
the  crisp  subset  X.  From  this,  we  can  naturally  infer  that  V  lies  in  E  where  B  c;  E,  here  E  is  any 
set  containing  B.  Thus,  knowing  that  John’s  age  is  between  25  and  35  allows  us  to  infer  that 
John’s  age  is  between  10  and  50.  In  the  fuzzy  framework  that  generalizes  to  what  is  called  the 
entailment  principle  [11].  This  principle  states  that,  from  the  knowledge  that  V  is  A,  we  can 
infer  V  is  F  where  AcF.  We  recall  that  for  fuzzy  subsets  Ac  Fit'  A(x)  <  F(x)  for  all  x. 

Clearly,  the  knowledge  that  V  is  contained  in  [25,  35]  is  more  infonnative  less  uncertain, 
then  the  knowledge  that  V  is  contained  in  [10,  50].  Furthennore  the  statement  that  V  is  25  is  even 
more  informative  than  either  of  the  preceding,  as  it  contains  no  uncertainty.  In  [12,  13,  14],  we 
introduced  the  concept  of  specificity  to  measure  the  amount  of  information  contained  in  a  fuzzy 
proportion  V  is  A.  Specificity  is  inversely  related  to  the  idea  of  uncertainty,  the  more  specific  the 
more  certain  our  knowledge. 

Definition  :  Assume  A  is  a  fuzzy  subset  over  X.  Let  x  be  such  that  A(x  )  =  Maxx[A(x)],  it  is 

/s 

an  element  having  the  maximal  membership  grade  in  A.  Let  A  be  the  average  membership  grade 
of  A  over  the  space  X  -  {x  },  it  is  the  average  over  all  elements  except  x  .  The  specificity  of  A, 
denoted  Sp(A)  is  defined  as  Sp(A)  =  A(x*)  -  A,  it  is  the  difference  between  the  highest 
membership  grade  and  the  average  of  all  the  other  elements. 

Note  1:  If  more  than  one  element  attains  the  highest  membership  grade  then  all  except  one  of 

13 

While  a  number  of  different  formal  definitions  have  been  suggested  we  shall  this  one  to  be  useful  for  our  purposes 
as  it  simply  captures  the  basic  idea  of  the  concept  specificity 
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these  are  used  to  find  the  average. 

Note  2:  We  note  the  specificity  of  the  statement  V  is  A  is  equal  to  Sp(A).  Thus  we  use  the  terms 
Sp(V  is  A)  interchangeably  with  Sp(A) 

We  can  observe  some  properties  of  Sp(A): 

1.  It  lies  in  unit  interval.  0  <  Sp(A)  <  1 . 

2.  Sp(A)  =  1  iff  there  exists  one  element  x  such  that  A(x  )  =  1  and  all  other 
elements  have  A(x)  =  0. 

3.  If  A(x)  =  c  for  all  x,  then  Sp(A)  =  0. 

4.  If  A  and  B  are  two  normal  fuzzy  subsets,  they  have  one  element  with 
membership  grade  1  and  A  3  B  then  Sp(B)  >  Sp(A).  Thus  containment  in  the  case  of  normality 
means  an  increase  of  specificity. 

Note:  Essentially  specificity  measures  the  degree  to  which  V  is  A  points  to  one  and  only  one 
element  as  the  value  of  V. 

As  we  shall  subsequently  see,  the  measure  of  specificity  can  play  an  important  role  in  the 
processing  of  information.  Consider  the  statement  V  is  A  where  A  is  a  normal  fuzzy  subset,  that 
is  there  exists  at  least  one  element  that  has  full  possibility  of  having  the  value  of  V.  We  earlier 
noted  that  if  B  j  is  such  that  B  |  c  A  as  well  as  remaining  normal  then  V  is  B  ]  provides  more 

information  about  the  value  of  V  than  the  original  statement  V  is  A.  Essentially  in  this  case  we 

introduced  some  degree  of  clarity,  we  reduced  the  uncertainty  by  reducing  the  possibility  of  some 
elements  while  still  leaving  the  possibility  of  finding  a  solution.  On  the  other  hand,  if  B2  A 
then  V  is  B2  provides  less  infonnation  than  V  is  A.  In  this  case,  we  have  reduced  our  certainty 
because  we  have  added  more  possibilities.  A  third  situation  is  where  we  have  V  is  B3  but  with 
B3  cA  but  with  B3  subnormal  Maxx[B3(x)]  <  1.  We  don’t  have  a  solution  completely  compatible 

with  the  assumption  that  V  lies  in  X.  In  this  case,  we  can  possibly  have  less  information  than  the 
original  statement  V  is  A,  Sp(B3)  <  Sp(A).  More  generally,  a  reduction  of  specificity  (certainty) 

in  our  knowledge  can  come  about  from  two  sources,  one  being  increased  possibility  and  the  other 
being  an  increase  in  conflict  with  the  assumption  that  its  value  lies  in  the  given  domain? 
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4.  On  the  Measures  of  Possibility  and  Certainty 


As  we  earlier  noted,  a  task  of  great  interest  is  the  answering  of  a  question  about  some 
variable.  Here,  we  shall  concern  ourselves  with  this  issue  when  our  knowledge  about  the  variable 
as  well  as  the  question  is  represented  using  the  preceding  representation.  Given  the  knowledge 
that  V  is  A  our  task  is  the  determination  of  the  validity  of  the  statement  V  is  B. 

In  order  to  build  our  intuition,  we  shall  initially  consider  the  case  in  which  the  sets  A  and 
B  are  crisp  sets.  There  are  two  situations  regarding  our  knowledge  of  A.  In  the  first,  we  have  no 
uncertainty  regarding  our  knowledge  of  V,  V  =  x\,  here  A  =  { x  ]  } .  In  this  situation,  we  can  very 
clearly  answer  our  question  about  the  truth  of  the  statement  V  is  B.  If  e  B  then  the  answer  is 
yes,  if  X}  g  B  then  the  answer  is  no.  This  exact  information  with  respect  to  the  value  of  V  leads 

to  precise  answers. 

The  second  case  is  where  A  is  not  a  singleton,  there  exists  some  uncertainty  about  the 
value  of  V.  This  is  the  more  typical  situation,  as  noted  by  Kay  [1],  in  intelligence  analysis.  The 
uncertainty  associated  with  the  knowledge  that  V  is  A  makes  the  clear  determination  of  whether 
another  statement  V  is  B  is  true  or  false  not  always  attainable.  Using  figure  8  can  help  us 
understand  the  situation  when  A  is  uncertain. 


Case  1  Case  2 


Case  3 


Figure  8.  Different  relations  between  knowledge  and  question 

We  see  in  case  1  knowing  that  V  is  A  assures  us  that  V  is  B  is  valid.  In  case  2,  knowing 
that  V  is  A  assures  us  that  V  is  B  is  not  true.  Finally,  in  case  3,  we  can't  tell.  Thus  we  observe 
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from  this  crisp  environment  that  we  have  the  following  rules  regarding  the  detennination  of  truth 
of  the  statement  V  is  B  given  V  is  A: 

If  A  c  B  then  the  answer  is  yes 
If  A  n  B  =  0  then  the  answer  is  no 
If  A  n  B  ^  0  and  A  <2  B  then  the  answer  is  I  don’t  know 
Thus  the  attainment  of  a  clear  answer  to  questions  in  the  face  of  uncertainty  in  our 
knowledge  is  not  always  attainable.  We  note  this  situation  holds  even  in  the  special  case  when  B 
is  a  singleton.  We  see  that  asking  if  V  =  30  if  we  only  know  that  V  e  [25,  40]  can’t  be  answered 
yes  or  no,  the  appropriate  answer  is  I  don 't  know. 

In  the  fuzzy  set  environment  more  sophisticated  tools  are  needed  to  address  this  problem. 
Two  measures  have  been  introduced  by  Zadeh  [3]  to  help.  These  are  the  measures  of  possibility 
and  certainty.  We  note  that  Dubois  and  Prade  [15,  16]  refer  to  the  measure  of  certainty  as  the 
measure  of  necessity.  In  the  following,  we  shall,  unless  otherwise  stated,  assume  A  and  B  are 
nonnal 

The  possibility  that  V  is  B  given  V  is  A  is  denoted  by  Poss[V  is  BA  is  A]  and  is  defined 
as  Poss[V  is  B  /  V  is  A]  =  Maxx[D(x)]  where  D(x)  =  Min[A(x),  B(x)].  Thus 

Poss[V  is  B  /  V  is  A]  =  Maxx[A(x)  a  B(x)] 

Since  D  =  A  n  B,  we  see  that  Poss[V  is  B  /  V  is  A]  is  the  maximum  degree  of  intersection 
between  A  and  B. 

The  second  measure  introduced  by  Zadeh  is  the  measure  of  certainty.  We  define  this  as 
Cert[V  is  B  /  V  is  A]  =  1  -  Poss[V  is  not  B  /  V  is  A]  =  1  -  Maxx[A(x)  a  B(x)] 

With  some  manipulation  we  attain 

Cert[V  is  B  /  V  is  A]  =  Minx[A(x)  v  B(x)] 

We  observe  that  Cert[V  is  B  /  V  is  A]  is  indicating  the  degree  to  which  A  is  contained  in  B.  That 
is  if  A  is  contained  in  B  the  knowledge  that  V  is  in  A  assures  us  that  it  is  in  B. 

These  measures  of  possibility  and  certainty  can  be  seen  as  respectively  providing  upper 
and  lower  (optimistic  and  pessimistic)  bounds,  on  the  answer  to  the  question  of  whether  V  is  B  is 
true  given  we  know  that  V  is  A. 

We  note  that  if  A  is  a  normal  fuzzy  subset,  there  exists  an  x  such  that  A(x  )  =  1,  then 
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Poss[V  «B/V  is  A]  >  B(x*)  and  Cert[V  is  B  /  V  is  A]  <  B(x*).  Thus  we  see  when  A  is  normal, 
we  have  Cert[V  is  B/V  is  A]  <  Poss[V  is  B  /  V  is  A]. 

Let  us  look  at  these  measures  for  some  special  cases  of  A  and  B.  We  first  consider  the  case 
where  A  and  B  are  crisp.  In  this  case,  Cert[B/A]  and  Poss[B/A]  must  be  either  one  or  zero.  We 
see  that  if  Cert[B/A]  =  1  then  Poss[B/A]  =  1  and  this  corresponds  to  the  case  where  V  is  B  is  true. 
If  Poss[B/A]  =  0  then  Cert[B/A]  =  0  and  we  know  that  V  is  B  is  false.  If  Cert[B/A]  =  0  while 
Poss[B/A]  =  1  then  we  are  in  the  situation  in  which  the  answer  is  unknown. 

Consider  the  situation  where  B  is  a  crisp  subset  and  A  can  be  fuzzy.  He  we  have  that 
Poss[V  is  B/V  is  A]  =  Maxx  e  b[A(x)] 

Cert[V  is  B  /  V  is  A]  =  Minx  g  g[A(x)]  =  1  -  Maxx  g  g[A(x)] 

An  important  special  case  of  this  is  where  B  =  {x  },  here  we  are  interested  in  determining 

whether  V  is  equal  to  some  particular  value.  In  this  case  we  see  that  Poss[V  is  B  /  V  is  A]  = 
A(x*)  and  Cert[V  is  B  /  V  is  A]  =  Minx  ^  x*[A(x)]  =  1  -  Maxx  ^  x*[A(x)].  The  certainty  is  the 

negation  of  largest  possibility  of  value  not  equal  to  x*.  We  also  observe  that  if  A(x  )  ^  1  then  we 

must  have  Cert[V  is  x  /  V  is  A]  =  0.  This  follows  since  with  nonnal  sets  there  exists  some 
element  xj  ^  x  with  A(xj)  =  1  and  hence  1  -  Maxx  ^  x*[A(x)]  =  0. 

We  also  observe  in  the  case  where  X  =  {x^,  X2},  if  we  ask  is  V  =  x\,  we  see  that 
Cert[V  is  x  ]  /  V  is  A]  =  1  -  A(x2).  It  is  simply  the  negation  of  the  possibility  of  the  other  element. 

Consider  now  the  special  case  where  A  is  a  crisp  set.  Here 
Poss[V  is  B/V  is  A]  =  Maxx  e  a[B(x)] 

Cert[V  is  B  /  V  is  A]  =  Minx[A(x)  v  B(x)]  =  Minx  e  /JB(x)]. 

If  additionally  we  assume  that  A  =  { x  ]  j  ,  the  value  of  V  is  exactly  known,  then 
Poss[V  is  B  /  V  is  A]  =  B(x^)  and  Cert[V  is  B  /  V  is  A]  =  B(x^) 
then  B(x  | )  is  the  validity  of  the  statement  that  V  is  B. 

Some  clarification  may  be  useful  here.  What  we  have  shown  is  that  generally  when  our 

information  about  a  variable,  V  is  A,  has  some  uncertainty  the  answer  to  any  question  about  the 

truth  of  the  statement  V  is  B  lies  in  some  interval.  Thus  if  A  is  not  a  singleton  the  truth  of  V  is  B 

lies  in  the  interval  [1,  u]  where  1  is  the  certainty  of  V  is  B  and  u  is  the  possibility  that  V  is  B.  Here 

[1,  u]  is  a  subset  of  the  unit  interval.  On  the  other  hand  if  A  is  a  singleton  then  the  truth  of  V  is  B 
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is  a  precise  value  b,  in  the  unit  interval.  Additionally  in  the  case  when  A  is  a  singleton  we  have 
that  B  is  a  crisp  set  then  b  equals  one  or  zero.  The  important  point  here  is  that  there  are  two 
manifestations.  One  being  as  a  result  of  our  lack  of  certainty  regarding  the  knowledge  of  A,  it  is 
granular  and  it  is  not  a  singleton,  generally  this  results  in  an  interval  for  our  truth  value.  The 
second  issue  is  related  to  a  lack  of  crispness.  The  sets  involved  are  fuzzy,  this  generally 
introduces  aspects  of  multi-values  logic,  1  and  b  are  not  necessarily  one  or  zero  but  can  be 
anywhere  in  unit  interval. 

In  cases  where  the  decision  process  requires  a  more  precise  determination  of  the  validity 
of  the  proposition  V  is  B  then  provided  by  the  interval  [u,  1]  we  must  provide  some  means  around 
this  difficulty.  However,  we  must  emphasize  that  the  actual  processing  of  the  information  about 
the  variable  V  has  left  us  with  some  uncertainty.  In  some  cases,  we  may  be  able  to  draw  upon 
techniques  from  decision-making  under  uncertainty  [17]  to  help  make  decisions  in  this  kind  of 
environment.  First  we  want  to  make  a  clear  distinction  between  the  analyst,  such  as  an 
intelligence  analyst,  and  what  we  shall  call  the  executive.  It  is  the  executive  who  makes  the 
decision  using  as  some  of  his  input  the  information  provided  by  the  analyst.  While  it  is  not  our 
purpose  here  to  go  into  great  detail  about  the  executive  task  of  decision  making,  since  we  are 
more  interested  in  the  analysis  task,  we  shall  make  a  few  comments. 

In  making  a  decision,  such  as  whether  we  should  preemptively  strike  an  adversary,  in 

addition  to  the  information  provided  by  the  analyst  about  the  state  of  V  which  may  be  uncertain, 

such  whether  they  have  weapons  of  mass  destruction,  an  executive  generally  draws  upon  two 

other  types  of  information  [18].  The  first  type  of  information  is  related  to  the  costs  or  payoffs 

associated  with  the  choice  of  an  action  and  the  possible  states  of  the  uncertain  variable  V. 

Formally  this  is  often  expressed  using  a  payoff  matrix.  The  second  type  of  information  is  related 

to  what  we  call  the  decision  makers'  attitudinal  character  [19].  This  component  of  the  decision 

process  has  an  extremely  subjective  nature.  It  is  here  people  can  have  strong  differences  of 

opinion,  which  are  purely  value  and  preference  driven.  Thus  one  executive  in  the  face  of  an 

uncertainty  regarding  the  relevant  variable  may  decide  to  act  in  a  way  that  defends  against  the 

worst  possibility,  the  so-called  Max-Min  decision  maker  [20].  Given  an  appropriate  use  of  this 

with  respect  to  the  available  knowledge  of  the  possible  outcomes  this  type  of  decision  cannot  be 
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said  to  be  right  or  wrong.  The  point  we  want  to  make  here  is  that  uncertainty  in  our  knowledge 
provides  space  for  the  inclusion  of  subjective  choices  by  the  executive  making  the  decision.  A 
simple  example  of  this  may  involve  preparing  for  a  party  in  which  we  are  not  sure  whether  20  or 
500  people  are  coming.  Clearly  if  we  prepare  for  500  and  only  20  show  up  then  we  wasted  a  lot  of 
money.  On  the  other  hand  if  we  prepare  for  20  and  500  show  up  we  have  some  embarrassment. 
The  choice  of  how  many  people  to  prepare  for  is  based  on  the  subjective  preferences  of  the  party 
giver,  the  executive,  with  regard  to  being  embarrassed  or  wasting  money,  there  is  no  right  or 
wrong  decision. 

In  the  preceding,  we  assumed  nonnality  with  respect  to  all  the  sets  involved;  all  sets  were 
assumed  to  have  at  least  one  element  with  a  membership  grade  1.  Here,  we  shall  make  some 
comments  about  the  situation  with  respect  to  sub-nonnality,  Maxx[A(x)]  <  1.  First,  we  note  sub¬ 
normality  is  generally  a  reflection  of  some  conflict.  Sub-normality  usually  arises  from  the 
combination  of  information  from  different  sources  when  there  is  some  conflict  between  the 
observations  of  the  sources.  A  second  way  it  can  arise  is  when  the  information  provided  by  an 
individual  source  is  in  conflict  with  the  assumption  about  the  domain  of  a  variable.  This  type  of 
situation  occurs  less  frequently.  Thus  we  shall  assume  that  our  primary  infonnation  supplied  by 
the  individual  sources  is  normal. 

In  fonnal  reasoning  systems,  based  on  logic,  the  appearance  of  conflicting  statements 
results  in  a  situation  in  which  we  can  infer  anything,  we  conclude  that  everything  is  true.  Our 
system  has  a  similar  property.  Assume  V  is  A  and  A  =  0  then  for  any  statement  V  is  B  we  have 
Cert[V  is  B  /  V  is  A]  =  Minx[A(x)]  v  B(x)]  =  1 .  Thus,  in  the  face  of  complete  conflict, 

everything  is  certain.  On  the  other  hand,  with  A  =  0,  Poss[V  is  B  /  V  is  A]  =  Max[A(x)  a  B(x)] 
=  0.  Thus  nothing  is  possible  but  everything  is  certain.  In  order  to  avoid  this  difficulty  of  having 
the  certainty  greater  then  the  possibility  we  shall  use  as  our  definition  of  certainty 
Cert[V  is  B  /  V  is  A]  =  (Minx[A(x)]  v  B(x)])  a  (Maxx[A(x)]) 

If  A  is  normal,  this  just  is  the  definition  for  certainty  we  previously  used,  Cert[V  is  B/V  is  A]  = 
Minx[A(x)]  v  B(x)].  If  A  =  0  we  get  Cert[V  is  B  /  V  is  A]  =  0.  More  generally  using  this 

definition  we  always  get  Cert[V  is  B/V  is  A]  <  Poss[V  is  B  /  V  is  A] 

We  make  one  further  comment  with  respect  to  normality.  Previously,  we  defined  the 
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entailment  principle  as  saying  from  V  is  A  we  can  infer  V  is  B  where  A  c=  B.  This  must  be 
modified  to  say  that  B  must  satisfy  the  Maxx[B(x)]  <  Maxx[A(x)].  Thus  if  A  is  normal  no 
additional  restriction  exists  on  B.  On  the  other  hand  if  Maxx[A(x)]  =  a  then  any  statement  V  is  B 
inferred  from  this  must  satisfy  both  A  c=  B  and  Maxx[B(x)]  <  a.  The  inferred  set  B  can't  be 
more  possible  then  the  original  set  A 


5.  Hedging  on  our  Data 

In  the  preceding,  we  introduced  V  is  A  as  a  structure  for  representing  uncertain  knowledge 
where  A  is  a  fuzzy  subset  of  the  domain  X  of  V.  We  indicated  that  this  generalized  the  idea  of 
knowing  that  V  lies  in  some  subset.  More  generally  this  fonnulation  imposes  some  constraint  on 
the  value  that  V  can  assume.  One  question  we  considered  was  determining  whether  the 
proposition  V  is  x  is  valid  given  the  knowledge  V  is  A.  We  showed  that,  with  uncertainty  in  our 
knowledge  about  V,  the  best  we  could  do  was  to  put  some  bounds  on  the  truth  of  the  hypothesis 

that  V  is  x*-  In  particular  Poss[V  is  x*  /  V  is  A]  =  A(x*)  provided  an  upper  bound  and  Cert[V  is 
x*  /  V  is  A]  =  Minx  ^  x*[A(x)]  =  1  -  Maxx  ^  x*[A(x)]  provided  a  lower  bound.  If  we  let  B  = 

{x*}  then  not  x*  is  B  =  X  -  {x*}  and  we  see  that  Cert[V  is  x*  /  V  is  A]  =  1  -  Poss[V  is  not  x*  /  V 
is  A], 

We  now  consider  the  situation  where  we  want  to  hedge  on  the  knowledge  that  V  is  A.  We 
let  a  e  [0,  1]  indicate  the  degree  of  confidence  we  attribute  to  the  proposition  V  is  A,  that  is  our 
knowledge  V  is  A  is  a- certain.  In  [21]  it  was  suggested  that  one  can  express  this  hedged 
knowledge  as  a  proposition  V  is  F  where  F(x)  =  Max[A(x),  a.]  =  A(x)  v  a.  Since  a.  =  1  -  a  we 
see  if  a  =  1  then  F(x)  =  A(x)  and  we  get  our  original  unhedged  proposition.  If  a  =  0  then 
a  =  1  and  F(x)  =  1  for  all  x.  Here,  our  statement  V  is  F  effectively  carries  no  information. 
Essentially  this  hedging  loosens  the  constraint  on  the  variable  V. 

In  the  following  we  shall  let  A*  denote  the  fuzzy  set  such  that  A*(x)  =  1  if  x  =  x*  and 
A*(x)  =  0ifx*x*. 

Let  us  see  what  happens  to  our  measures  of  possibility  and  certainty  in  this  hedged 
situation 
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Poss[V  is  x*  /  V  is  A  is  a-cert]  =  Maxx[(A(x)  v  a])A  A*(x)]  =  A(x*)  v  a 
In  the  case  of  certainty  we  have 

Cert[V  is  x*  /  V  is  A  is  a-cert]  =  1  -  Maxx  ^  x*[F(x)]  =  Minx  ^  x*[F(x)]. 

Since  F(x)  =  a  v  A(x)  then  Cert[V  is  x*  /  V  is  A  is  a-cert]  =  1  -  Maxx  ^  x*[a  v  A(x)]  =  1  — 
a  v  Maxx  ^  x[A(x)]  =  a  a  Minx  ^  x*[A(x)]  =  Min[a,  Minx  ^  x*[A(x)]].  More  intuitively  we 
see  that  Cert[V  is  x  /  V  is  A  is  a-cert]  =  Min[a,  Cert[V  is  x  /V  is  A]],  it  is  the  smaller  of  a 
and  the  certainty  of  the  unhedged  situation. 

In  anticipation  of  what  we  shall  do  in  the  following,  we  shall  refer  to  these  as  optimistic 
and  pessimistic  measures 

Opt(V  is  x*  /  V  is  A  is  a-cert)  =  Poss[V  is  x*  /  V  is  A  is  a-cert]  =  A(x*)  v  a 

Pess(V  isx  /V  is  A  is  a-cert)  =  Cert[V  is  x  /  V  is  A  is  a-cert] 

=  a  a  Minx  =  x*[A(x)] 

=  a  a  (1  -  Poss[V  is  not  x  /Vis  A]) 

We  now  consider  an  alternative  method  for  representing  a  certainty  quantified  statement 
using  the  Dempster-Shafer  belief  structure  [22].  Here  we  represent  the  statement  V  is  A  is  a-cert 
by  the  proposition  V  is  m  where  m  is  a  D-S  belief  structure  with  two  focal  elements,  B  |  =  A  and 

B2  =  X  having  m(B  ] )  =  a  and  m(B2)  =  1  -  a  .  In  this  framework,  we  use  the  plausibility  and 
belief  measure  to  obtain  our  optimistic  and  pessimistic  bounds  on  the  validity  of  the  statement  V 
is  x*.  We  recall  the  plausibility  and  belief  measures  are  respectively  the  expected  possibility  and 
expected  certainty. 

2 

P1[V  is  x*  /  V  is  m]  =  m(Bj)  Poss[V  is  x*  /  V  is  B{] 
i  =  1 

=  a  Poss[V  is  x*/V  is  A]  +  a  Poss[V  is  x*  /  V  is  X] 

=  a  A(x*)  +  a  =  1  -  a  A(x*) 

2 

Bel[V  is  x*  /  V  is  m]  =  m(Bj)Cert[V  is  x*  /  V  is  Bj] 

i  =  1 

=  a  Cert[V  is  x*  /  V  is  A]  +  a  Cert  [V  is  x*  /  V  is  X] 

=  a  Minx  ^  x*[A(x)]  +  a  0  =  a  Minx  ^  x*[A(x)] 

We  observe  that  the  more  generally  pessimistic  measures  can  be  generalized  using  a  t- 
nonn  [23],  Thus  if  T  is  any  t-norm  then 
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Pess[V  is  x  /  V  is  A  is  a-cert]  =  T[a,  Minx  ^  x*  [A(x)]. 

The  optimistic  measure  can  be  generalized  using  a  t-conorm  [23].  Thus  if  S  is  any  t-conorm  then 

Opt[V  is  x*  /  V  is  A  is  a-cert]  =  S(a,  A(x*)] 

We  shall  not  here  investigate  the  issues  involved  of  selecting  among  these  possibilities 

6.  Multi-Source  Information  Fusion 

We  now  turn  to  the  issue  of  aggregation  of  information  from  multiple  sources,  Multi- 
Source  Information  Fusion  (M-SIF). 

If  V  is  A  and  V  is  B  are  two  pieces  of  information  then  their  conjunction  (fusion)  is  V  is  D 
where  D  =  A  n  B,  that  is  D(x)  =  Min[A(x),  B(x)].  More  generally,  if  V  is  Ap  for  i  =  1  to  q,  are  a 

A  A 

collection  of  propositions  from  multiple  sources  then  their  conjunction  is  V  is  D  where  D  = 

q 

O  Aj  here  D(x)  =  VI  i n  j  [ A j ( x ) ] .  We  observe  one  fundamental  feature  of  this  conjunction 
i  =  1 

q 

process.  For  all  x,  D(x)  <  A{(x)  that  is  D  c:Ap  More  generally,  if  D  =  fA  A{  and  E  =  D  n  Aq 

i  =  1 

+  i  then  EcD,  E(x)  <  D(x)  for  all  x.  Thus  we  see  the  more  information  we  get  the  smaller  the 
fuzzy  subsets. 

In  using  multiple  sources  of  information,  usually,  our  objective  is  to  increase  the  amount 

of  information  we  have  about  the  variable  of  interest.  We  desire  to  increase  the  specificity.  We 

q 

observe  that  if  D,  O  Ap  is  normal  then  Sp(D)  >  Sp(A{)  for  any  i  and  we  have  we  have  gained 
i  =  1 

information.  Thus,  here  if  the  information  is  not  conflicting  then  fusing  the  information  supplied 
by  the  multiple  sources  is  a  process,  which  can’t  decrease  the  infonnation  we  have  from  any  of  the 
individual  sources.  Normally,  in  this  case,  D  usually  is  more  informative  than  any  of  the 
individual  sources 

However,  if  some  of  the  source  information  are  conflicting,  this  may  result  in  a  situation  in 
which  D  is  subnormal,  MaxxD(x)  <  1 .  In  this  case,  the  fusion  of  the  sources  may  provide  us  with 

14 

Here  unless  otherwise  stared  we  shall  assume  the  Aj  are  normal,  have  at  least  one  element  with  membership  grade 

1 
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a  situation  in  which  we  are  more  confused  and  our  informativeness,  specificity,  has  decreased. 
Generally  it  is  difficult  dealing  with  situations  in  which  we  have  conflicting  source  information. 
One  approach  to  addressing  this  situation  is  not  to  use  all  the  information.  That  is,  we  selectively 
choose  which  information  to  use  and  fuse.  This  requires  adjudicating  between  the  information 
supplied  by  the  different  sources.  Often  the  choice  of  the  appropriate  manner  of  adjudication 
requires  the  use  of  subjective  considerations  on  the  part  of  the  person  ultimately  responsible  for 
fusing  the  information.  In  the  following,  we  shall  suggest  one  approach  to  addressing  this 
problem.  We  should  note  that  other  approaches  are  possible. 

As  we  shall  subsequently  see,  this  process  generally  involves  a  tradeoff  between  selecting 
a  subset  of  the  available  information  that  is  not  conflicting  and  yet  large  enough  to  provide  a 
credible  fusion  of  the  available  information.  The  technique  we  shall  suggest  will  make  use  of  the 
concept  a  credibility  measure  to  help  in  this  process. 

Let  denote  V  is  Ap  a  be  piece  of  data  about  the  variable  V.  We  refer  to  the  collection  of 
these  as  P  =  {Pi,  ...,  Pq}.  We  associate  with  P  a  measure  p:  2P  — »  [0,  1]  such  that  for  each  subset 
B  of  P,  p(B)  indicates  the  credibility  of  using  as  our  fused  knowledge  the  conjunction  of  the  data 
in  B.  We  shall  call  p  the  credibility  measure.  We  can  associate  with  p  some  basic  properties: 
p (0)  =  0  and  p(P)  =  1.  Additionally  p  must  be  monotonic,  if  Bj  c:  B2  then  p(B2)  >  p(Bj). 

Assume  B  is  a  subset  of  P.  Let  Dg  =  O  Ap  it  is  the  fusion  of  the  knowledge  in  B. 

Pi  e  B 

We  observe  using  the  subset  B  leads  to  the  statement  V  is  Dg.  However,  any  statement  obtained 
by  using  only  the  information  in  B  only  has  a  credibility  of  p(B). 

In  order  to  determine  the  quality  of  the  knowledge  obtained  by  using  the  subset  B  we  must 
consider  two  criteria.  One  criteria  is  that  the  knowledge  provided  by  fusing  the  data  in  B  is 
informative  and  the  other  criteria  is  that  B  is  credible.  The  degree  of  satisfaction  to  the  criteria  of 
informativeness,  Inf(B),  can  be  obtained  using  the  measure  of  specificity,  thus  Inf(B)  =  Sp(Dg). 

We  recall  Sp(Dg)  =  Dg(x*)  -  Ave  (Dg)  where  x*  is  any  element  having  maximal  membership 

X-{x*} 

grade  in  Dg.  The  credibility  of  using  the  subset  B,  Cred(B),  can  be  measured  by  p(B).  Since  our 

measure  of  quality  is  an  anding  of  these  criteria  we  can  define  the  measure  of  the  quality  of  the 
result  obtained  using  the  subset  B  as  Qual(B)  =  Inf(B)  Cred(B),  thus  Qual(B)  =  Sp(Dg)  p(B). 
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An  interesting  alternative  view  of  our  measure  Qual(B)  can  be  obtained.  In  the  preceding, 
we  indicated  that  a  statement  such  as  V  is  Dg  is  p(B)-cert  as  being  translated  into  V  is  F  where 
F(x)  =  Dg(x)  v  (1  -  ju(B)).  We  note  that  v  is  an  example  of  a  t-conorm.  It  is  the  Max  t-cononn, 
S(a,  b)  =  Max(a,  b)  =  a  v  b.  Let  us  consider  the  use  of  another  t-cononn.  A  particularly 

interesting  one  is  S(a,  b)  =  a  +  b  -  ab,  this  is  called  the  bounded  sum  [24].  If  we  use  this  instead  of 
the  Max  we  get  Fg(x)  =  Dg(x)  +  (1  -  p(B))  -  (Dg(x)(l  -  p(B))).  After  a  little  algebra  we  get 
Fg(x)  =  (1  -  h(B))  +  p(B)Dg(x) 

We  note  that  it  is  monotonic  with  respect  to  Dg(x),  if  Dg(xj)  >  Dg(x2)  then  Fg(xj)  >  Fg(x2). 

Consider  now  the  measure  of  specificity.  We  first  recall  Sp(Dg)  =  Dg(x*)  -  Ave  (Dg) 

X-{x*} 

Consider  now  Sp(Fg)  here  since  x  still  provides  the  largest  membership  in  Fg  then 

Sp(Fg)  =  Fg(x*)  -  Ave  (Fg) 

X-{x*} 

Sp(Fg)  =  ((1  -  p(B))  +  p(B)Dg(x*))  -  Ave  [(1  -  p(B))  +  p(B)Dg(x)] 

x*x* 

The  nature  of  Ave  is  such  that 

Ave 

Ave  [(1  -  p(B»  +  p(B)Dg(x)]  =  (1  -  p(B))  +  q(B)X-{x*}(Dg) 
x^x* 


Thus  here 

Sp(Fg)  =  ((1  -  p(B))  +  p(B)Dg(x*))  -  (1  -  p(B))  -  p(B)  Ave  (Dg) 

X-{x*} 

Sp(Fg)  =  p(B)  (Dg(x*)  -  Ave  (Dg)) 

X-{x*} 

Sp(Fg)  =  aSp(Dg) 

Thus  using  this  definition  for  certainty  qualification  leads  to  a  very  nice  result  for  the 
relationship  between  the  specificities  of  Dg  and  Fg.  This  can  be  of  great  use  in  finding  the  best 

solution  to  the  fusion  problem. 

We  make  some  observations  about  this  process  of  multi  source  fusion.  First,  observe  that 
if  the  whole  collection  of  data  P  is  such  that  Dp  is  nonnal  then  for  all  B  since  Dp  ci  Dg  then  Dg 

is  also  normal.  Hence,  in  this  case,  Sp(  Dp)  >  Sp(Dg).  Furthermore,  since  p( P)  =  1  >  p(Dg)  then 
Qual(P)  =  Sp(  Dp)  >  p(Dg)  Sp(Dg)  >  Qual(B). 

Thus,  in  the  case  where  the  fusion  of  the  data  from  all  the  sources  doesn’t  induce  any  conflict  the 
most  infonnative  thing  to  do  is  to  use  fusion  of  all  the  data  in  P. 

More  generally  ,we  make  the  following  observation. 
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Observation:  If  B  j  is  a  subset  of  P  such  that  Dg  |  is  normal  then  for  all  subsets  B2  of  P  such  that 
B2  c=  Bj  then  Qual(B^)  >  Qual(B2). 

Justification:  Since  B2  cBj  then  Dg^  c  Dg2  and  jli(B ^ )  >  |u(B2).  Since  both  are  normal  it 
follows  that  Sp(Dgj)  >  Sp(Dg^)  and  hence  Qual(Bj)  >  Qual(B2). 

Definition:  We  shall  call  a  subset  B  where  Dg  is  a  normal  a  non-conflicting  subset.  Furthermore, 
we  call  a  subset  B  maximally  non-conflicting  if  B  is  non-conflicting  and  the  addition  of  any 
other  piece  of  data  to  B  results  in  sub-normal  fusion. 

Observation:  Any  subset  B  of  data  containing  a  maximally  non-conflicting  subset  can't  provide 
the  best  fusion. 

We  shall  now  consider  some  examples  of  the  credibility  measure.  One  special  class  of 
credibility  measure  are  those  we  call  cardinality  based  measures.  For  these  measures  no 
distinction  is  made  between  the  credibility  of  the  different  pieces  of  data,  p(B)  just  depends  on 
how  many  pieces  of  data  are  in  B,  the  cardinality  of  B.  We  can  define  a  cardinality-based 
measure  using  a  function  h:[0,  1]  — »  [0,  1]  that  satisfies  h(0)  =  0,  h(l)  =  1  and  is  monotonic, 

h(rj)  >  h(r2)  if  rj  >  r2-  Using  h  we 

IBI 

can  define  p(B)  =  h(- — -).  These  types  of  functions  are  often  obtained  as  a  representation  of  some 
IP  I 


linguistic  quantifier  such  as  most,  "at  least  about  half. 

Another  class  of  credibility  measures  are  those  that  are  completely  additive.  Here  we 

q 

associate  with  each  piece  of  data  Pj,  a  value  oq  e  [0,  1]  and  assume  oq  =  1.  In  this  case 

i  =  1 

p(B)  =  oj. 
j  eB 

Let  G^,  k  =  1  to  g,  be  a  collection  of  subsets  of  P  that  provides  a  partition  of  P  One 

example  of  credibility  measure  p  using  this  is  one  where  p(B)  =  1  if  B  contains  at  least  piece  of 
data  from  each  of  the  G|<  and  p(B)  =  0  otherwise.  Closely  related  to  this  is  a  measure  in  which  we 

BnGk 


associate  with  each  G]<  a  nonnegative  value  g^  and  define  p(B)  =  gj,- 

k=  1 

assume  the  g^  sum  to  one. 


IBI 


-.  Here  we  also 


Another  type  credibility  measure  is  one  that  contains  a  crucial  piece  of  data.  We  say  that 
Pj  is  crucial  if  p(B)  =  0  if  Pj  g  B. 
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Another  interesting  example  of  credibility  measure  is  the  following.  Let  B  \  be  a  subset  of 
P.  Consider  a  measure  such  that  p(B)  =  0  i  f  B  ]  n  B  ^  0  and  Bj  cB.  This  measure,  which  we 
call  a  balanced  measure,  requires  that  if  we  include  any  data  from  B  |  in  our  fusion  we  must  use 
all  the  data  in  B  p 

Let  us  summarize  the  procedure  we  suggested  for  providing  a  user  with  quality  fusion  of 
the  data  in  the  collection  P.  The  first  step  is  to  calculate  the  subset  B  of  P  with  the  highest 

quality  conjunction  of  its  component  data.  That  is  we  find  B  such  that  Qual(B  )  = 

Max  [Qual(B)]  where  Qual(B)  =  Sp(Dg)  p(B).  Having  found  this  subset  B  we  indicate  to  the 
BcP 

client  that  V  is  Dg*  is  the  result  of  our  multi-source  data  fusion  and  that  the  credibility  of  this 
information  is  p(B*). 

7.  Multiple  Fused  Values  from  Multi-Source  Data 

In  some  situations,  the  presentation  of  a  single  fused  value  may  not  be  sufficient  or 
appropriate.  Here  we  shall  suggest  a  process  that  will  allow  us  to  provide  multiple  fused  values 
over  the  data  set  P. 

Our  point  of  departure  is  again  a  collection  of  multi-source  data  P  =  {Pp  ...,  Pq}.  Each 
piece  of  data  Pj  being  of  the  fonn  V  is  Aj  where  Aj  is  a  fuzzy  subset  of  the  domain  of  V,  X.  In 

addition,  we  have  a  credibility  measure  p:  2P  — »  [0,  1]  where  p(B)  is  the  degree  credibility 
assigned  to  a  fusion  using  the  data  in  the  subset  B  of  P. 

In  the  preceding,  we  defined  a  process  for  obtaining  an  optimal  subset  B  |  and  which 
provided  a  fused  value  V  is  Dg^  with  credibility  p(Bj).  Here  Dg^  =  O  Aj.  This  approach 

j,  Pj  G  Bi 

finds  the  subset  of  data  Bj  such  that  that  Qual(Bj)  =  p(Bj)  Sp(Dg1)  =  Max  [p(B)  Sp(Dg)]. 

1  BicP 

We  shall  refer  to  this  process  as  Qual-Fuse(P,  p).  Thus  Qual-Fuse(P,  p)  returns  Bj  which 
enables  the  determination  of  Dg^  and  p(B  | ). 

In  the  following,  we  shall  suggest  a  procedure  which  allows  the  for  generation  of  multiple 
fusions  from  the  pair  (P,  p).  For  notational  convenience  in  the  following  we  shall  find  it 
convenient  to  denote  the  fuzzy  subsets  Aj  as  Aj1,  thus  our  data  is  still  P  =  {Pp  ...,  Pq}  where  Pj 
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corresponds  to  the  observation  V  is  A1.  p  is  still  a  credibility  measure  over  P. 

The  basic  algorithm  of  our  procedure  is  as  follows. 

1.  Initialize  our  system  with  P,  p  and  set  i  =  1 . 

2.  Apply  Qual-Fuse(P,  p)  this  returns  B  ]  and  Dg  |  and  V  is  Dg  |  with  credibility  p(B  | ). 

-y  9  1 

3.  Revise  each  of  the  Pj  to  V  is  Aj~  where  Aj  =  Aj  -  Dg  |  .That  is  we  remove  the  subset  Dg  |  from 
the  subset  Aj1  We  recall  Aj1  -  Dg  |  =  A-  A-  n  Dg  |  and  therefore  A~  (x)  =  Min[Aj  (x),  1  -  Dg^(x)] 

4.  Set  i  =  2 

5.  Let  P  =  [Pg...,Pi]  with  Pj  such  that  Vj  is  Aj 

6.  Apply  Qual-Fuse(P,  p).  This  returns  Bj  and  the  statement  V  is  Dg-  with  credibility  p(Bj). 

Here  Dg  .  =  H  Aj 

j,  Pj  e  8; 

7.  Additional  fusion  desired  ?  No  -  stop,  Yes  -  continue 

8.  Set  i  =  i  +  1 

9.  Calculate  Aj  =  Aj'1  -  Dg. 

10.  Go  to  step  5. 

The  final  result  of  this  process  is  a  collection  of  fused  values  of  the  form 

V  is  Dg  |  with  credibility  p( B  j ) 

V  is  Dg2  with  credibility  p(B2) 


V  is  Dg^  with  credibility  p(B0 


The  key  idea  we  suggested  here  is  the  removal  of  the  already  presented  fused  value  from 

the  data  remaining  to  be  used  to  fuse.  This  is  very  much  in  the  spirit  of  the  Mountain  Clustering 
method  [25,  26],  This  removal  process  tends  to  result  in  a  situation  where  the  Dgj  are  disjoint . 


An  interesting  issue,  one  which  we  shall  not  investigate  in  detail  here,  is  when  to  stop  the 
process  of  providing  additional  fusions.  In  its  simplest  form,  this  can  be  just  based  on  an  input 
from  the  user,  for  example  how  fused  values  they  want.  A  more  computationally  based  approach 
could  be  one  in  which  we  stop  when  the  quality  of  the  next  proposed  fusion  falls  below  some 
level. 
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8.  Fusing  Probabilistic  and  Possibilistic  Data 

An  important  issue  in  the  field  of  data  fusion  concerns  itself  with  the  combination  of  two 
pieces  of  infonnation  where  one  is  expressed  in  terms  of  a  fuzzy  subset  (possibility  distribution) 
and  the  other  is  expressed  in  terms  of  a  probability  distribution  [27,  28],  Here  we  shall  introduce 
some  ideas  related  to  this  problem. 

Let  G  be  an  attribute  that  is  associated  with  some  class  of  objects  Z.  Let  X  be  the  domain 

in  which  this  attribute  takes  its  value.  Our  interest  here  is  on  the  determination  of  the  value  of  the 

attribute  G  for  some  specific  entity,  z*,  from  this  class.  Thus  we  are  interested  in  the 
determination  of  the  value  of  variable  G(z  ).  We  shall  denote  this  variable  as  G*. 

Consider  a  piece  of  data  about  G*  such  a  G*  is  A  where  is  a  fuzzy  subset  of  X.  Let's  look 

at  this  data  more  carefully.  First,  we  see  it  is  directly  about  the  variable  of  interest.  That  is,  it  is  a 

statement  about  the  attribute  for  the  object  of  interest.  Often,  this  information  is  a  result  of  some 

linguistically  expressed  observation  such  as  "The  bomb  thrower  was  young."  As  noted  by  Zadeh 
[29]  this  statement  puts  some  constraint  on  the  possible  values  of  the  variable  G * .  It  generalizes 

the  idea  of  having  a  more  crisp  statement  such  as  "The  age  of  the  bomb  thrower  was  between  18 
and  25."  It  of  course  reflects  some  uncertainty  with  respect  to  the  sources  observation.  In  the 
situation  in  which  A  is  assumed  normal,  this  uncertainty  can  be  measured  by  the  cardinality 
subset  A,  Ix  A(x).  In  the  case  where  we  must  deal  with  subnormality  a  more  sophisticated 

measure  such  as  Un(A)  =  1  -  Sp(A)  =  1  -  (Max(A)  -  Ave(A))  should  be  used.  We  see  if  Max(A)  = 
1  then  Un(A)  =  Ave(A)  which  is  essentially  Xx  A(x). 

Let  us  now  turn  to  the  situation  in  which  we  have  additional  probabilistic  infonnation 
consisting  of  a  probability  distribution  P  over  the  space  X  where  P(xj)  is  the  probability  associated 

with  the  attribute  value  xp  In  order  to  find  a  basis  for  fusing  these  two  pieces  of  information,  the 
possibility  distribution  A  and  the  probability  distribution  P,  we  shall  take  advantage  of  a  view 

proposed  by  Coletti  and  Scozzafava  [30].  In  [30]  the  authors  suggested  that  an  element's 
membership  grade  in  a  fuzzy,  A(x{),  can  be  viewed  as  the  conditional  probability  of  A  given  xp 

P(A/xj)  =  A(xj).  Having  this  allows  us  to  use  Bayes'  rule  to  generate  the  fused  information.  Let 
P(x/A,  P)  indicate  the  probability  of  x  given  are  two  pieces  of  knowledge.  In  particular,  P(x/A,  P) 
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_  / x)  p(x).  Using  P(A/x)  =  A(x)  we  have  P(x/A,  P)  =  t  ^  * ]  P(x).  Furthermore,  since  P(A)  = 


P(A) 


P(A) 


P(— )  P(xj)  then  P(A)  can  be  expressed  A(xj)  P(xj).  Using  this,  we  get  P(x/A,  P) 


i  =  1 


i  =  1 


A(x) 

— — — — — .  At  times  we  shall  find  it  convenient  to  express  this  as  P(x/A,  P)  = - — 


SiA(Xi)  •  P(Xi) 


SiA(xi) 


P(Xj) 

P(x) 


Thus  the  result  of  fusing  these  two  pieces  of  data  is  a  probability  distribution  with  respect 
to  the  value  of  G*.  Using  the  notation  suggested  by  Zadeh  in  [29]  we  can  express  this  as  G*  isp 


R  where  R  indicates  a  probability  distribution  on  X  such  that  P(x/A,  P),  as  defined  above,  is  the 
probability  that  G*  assumes  the  value  x.  The  fact  that  this  is  the  case  is  not  surprising  since  the 

knowledge  in  the  possibility  distribution  is  actually  saying  that  the  value  of  the  variable  G*  lies  in 
a  set,  A.  So  we  are  actually  finding  the  probability  of  x  conditioned  on  the  knowledge  that  G* 


lies  in  a  set. 

Let  us  look  at  this  for  some  special  cases  to  see  if  it  is  consistent  with  our  intuition.  First, 
consider  the  case  where  P(xj)  =  jL  Here,  the  probability  distribution  is  essentially  providing  no 


information.  In  this  case,  we  have 


P(xj) 

P(x) 


1  for  all  x{  and  hence  P(x/A,  P) 


A(x) 


Thus 


A(xi) 


i  =  1 


here  we  obtain  P(x/A,  P)  as  simply  a  normalization  of  the  possible  distribution. 

Consider  now  the  case  in  which  A(x{)  =  1  for  all  xj.  Here  the  possible  distribution  is 

P(x) 

providing  no  information.  In  the  case  P(x/A,  P)  =  — — —  =  P(x).  We  get  back  the  original 

SiP(Xi) 


probability  distribution. 

Consider  the  case  where  A  corresponds  to  some  crisp  subset  B  of  X.  That  is  A(x{)  =  1  for 

P(x) 

xj  e  B.  In  this  case  P(x/A,  P)  = - — — .  This  is  the  classic  case  of  conditional  probability. 

P(Xi) 

xjeB 


One  issue  that  must  be  addressed  is  conflicting  information.  Consider  the  case  where  we 
have  A(xj)  =  1  and  A(xj)  =  0  for  all  other  xj  and  where  P(x  | )  =  0.  In  this  case,  we  see  that 

P(xi)A(xi)  =  0  for  all  xj  and  our  aggregation  leads  to  a  kind  of  indeterminism.  Here,  we 


essentially  must  decide,  do  we  believe  the  possibility  distribution  which  says  the  answer  is 
definitely  x  |  or  do  we  believe  the  probability  distribution  which  says  the  answer  is  definitely  not 

xi- 
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Another  form  of  conflict  can  be  seen  in  the  following  case.  Let  A  =  {— ,  0^-,  — }  and  let 

X1  x2  x3 

the  probabilistic  information  be  such  that  P(xj)  =  0,  P(x2)  =  0.1  and  P(x3)  =  0.9.  In  this  case  we 
obtain  P(x^/A,  P)  =  0,P(x2/A,  P)  =  1  and  P(x3/A,  P)  =  0.  This  may  be  somewhat  disturbing. 
Here  while  both  pieces  of  information  lend  little  support  to  X2  their  combination  leads  to  its 
strong  support 

In  order  to  address  this  issue  of  conflict  we  must  first  consider  the  context  in  which  we 
obtain  probabilistic  information.  We  can  envision  two  situations  when  we  obtain  probabilistic 
information.  One  of  these  is  in  a  frequent  spirit  and  the  other  is  of  a  subjective  kind. 

One  situation  where  we  have  probabilistic  information  is  where  the  probability  distribution 
is  a  reflection  of  some  observation  about  the  attribute  G  over  the  objects  in  the  class  Z.  Thus  here 
P(xj)  is  the  probability  that  an  object  in  Z  has  value  for  attribute  G  equal  to  xj.  For  example,  if  xj 

=  26,  then  P(26)  is  the  probability  that  "a"  bomb  thrower  is  26.  The  point  we  want  to  emphasize 

here  is  that  this  information  is  not  directly  about  the  entity  of  interest  z  .  It  is  not  infonnation 
about  our  variable  of  interest  G  *,  G(x  ).  Although  it  is  useful  and  valuable  infonnation,  it  is  not 

directly  about  the  object  of  interest.  The  important  observation  here  is  that  the  information 

contained  in  this  type  of  probabilistic  information  is  of  a  lower  priority  than  the  direct  infonnation 
contained  in  a  statement  G*  is  A.  Thus,  here  there  is  a  priority  ordering  with  respect  to  our 

information  and  in  the  face  of  conflict  we  want  to  give  preference  to  the  direct  information,  G*  is 
A. 

The  use  of  a  probabilistic  representation  can  also  occur  in  the  case  in  which  the  source  is 

providing  information  directly  about  the  attribute  value  for  the  object  of  interest.  Consider  the 
situation  where  the  source  has  some  uncertainty  with  regard  to  the  actual  value  of  the  variable  G*. 

Here,  he  uses  the  probability  framework  to  express  his  perception  of  the  uncertainty.  He  is  saying 
that  my  feeling  about  the  uncertainty  associated  with  the  value  of  G*  is  similar  to  that  of  a  random 

experiment  in  which  P(xj)  is  the  probability  that  G*  =  xp  Again,  in  this  situation,  the  information 

provided  by  the  source  is  also  less  direct  that  that  provided  by  the  observation  that  G*  is  A. 

The  overall  point  we  want  to  make  here  is  that  often  the  information  provided  using  a 

probabilistic  representation  has  a  lesser  priority  than  that  provided  using  the  fuzzy  representation. 

This  is  not  to  say  that  fuzzy  sets  are  better  then  probability  but  only  that  the  type  of  information 
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represented  by  a  probability  distribution  is  less  directly  relevant 

This  distinction  in  the  priority  of  the  two  different  kinds  of  information  allows  us  to 
provide  a  reformulation  of  the  aggregation  of  these  two  kinds  of  information  to  allow  for  an 
intelligent  adjudication  of  conflicts.  As  a  first  step  in  this  process,  we  shall  turn  to  the  issue  of 
measuring  the  conflict  or  conversely  the  consistency  between  a  probability  distribution  and  a 
possibility  distribution. 

Let  n  :  X  — »  [0,  1]  be  a  possibility  distribution  over  the  X,  thus  Il(xj)  indicates  the 

possibility  of  xp  Here,  we  shall  assume  this  is  normal,  there  exists  some  x  such  that  n(x  )  =  1 . 

Let  P:  X  — >  [0,  1]  be  a  probability  distribution  over  X.  P(x{)  indicates  the  probability  of  xp  The 

probability  distribution  has  the  added  requirement  that  XpP(xi)  =  1.  Let  p  =  Maxj[P(xj)]  it  is  the 

1 

maximal  probability  associated  with  P.  We  can  observe  that  L  <P  <1,  where  n  is  the  cardinality 
of  X.  It  is  well-known  that  the  negation  of  the  Shannon  entropy,  ZpP(x{)  ln[P(xj)],  provides  a 

measure  of  information  content  of  a  probability  distribution.  What  is  worth  pointing  out  is  the 
Maxj(P(xj))  provides  an  alternative  measure  of  this  information  content  [31,  32].  While  Shannon 

measure  has  some  properties  that  make  it  preferred,  especially  when  we  consider  multiple 
distributions,  in  the  case  when  we  are  focusing  on  one  probability  distribution,  Maxj(P(xj)) 

provides  a  simple  and  acceptable  measure  of  the  information  content  of  a  probability  distribution. 

We  now  introduce  a  measure  called  the  consistency  of  IT  and  P 

Consist(II,  P)  =  Maxi[L[(xj)  a  P(xj)]  where  P(xj)  = 

P 

We  observe  that  if  P  is  such  that  if  P(x^)  =  ^  for  all  x^  then  p  =  Jj-  and  P(x^)  =  1  for  all  xp  In  this 

case  both  II(x  )  =  1  and  P(x  )  =  1  and  hence  Consist(II,  P)  =  1 .  Thus,  the  situation  when  P  has 
maximal  uncertainty  it  is  consistent  with  any  possibility  distribution.  On  the  other  hand  we  see 
that  if  P(x^)  =  1  and  ITjxq)  =  0  then  Consist(Ll,  P)  =  0  they  are  in  complete  conflict.  In  the  case 

where  X  =  {x^,  X2,  X3}  and  Thxq)  =  1,  TI(x2)  =  0.1  and  TI(x3)  =  0  while  P(xj)  =  0,  P(x2)  =  0.1 
and  P(x3)  =  0.9  we  get  P(xj)  =  0,  P(x2)  =  0.1 1  and  P(x3)  =  1  and  hence  Consist(II,  P)  =  0.1 

We  now  consider  the  modification  of  the  procedure  for  aggregating  possibility  and 
probability  distributions  which  uses  this  measure  of  consistency  to  aid  in  the  adjudication  of 
conflicting  information. 
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In  the  preceding,  we  defined  the  aggregation  of  V  is  A  and  the  probability  distribution  P  as 

A(x)  P(x) 

inducing  a  probability  distribution  where  P(x/A,  P)  = - — — — — . 


A(xj)  P(xi) 


i  =  1 


We  now  provide  a  modification  of  this  to  account  for  conflicts  between  the  input 
distributions.  As  we  shall  see,  this  is  will  give  a  priority  to  the  information  V  is  A. 

Letting  a  =  Consist(P,  A)  we  define 


P(x/A,  P)  =  ■ 


A(x)[a  P(x)  +  a  1] 
A(xj)[a  P(xj)  +  ajL 


j  =  l 

Let  us  see  how  this  works.  If  the  two  sources  are  consistent.,  a=  1,  then 
A(x)P(x) 

P(x/A,  P)  — -  ;  ; —  and  we  get  our  original  fonnulation.  If  the  two  pieces  of 


j  =  l 


A(xj)P(xj) 


A(x)  —  a(x) 

information  are  completely  conflicting,  a  =  0  we  get  P(x/A,  P)  =  — - —  =  — — — — .  Here 

A(aJ)rr  A(aj) 

]=1  j=i 


we  completely  discount  the  information  contained  in  the  probability  distribution  P  and  simply 
obtain  P(x/A,  P)  as  a  normalization  of  A. 

Here,  we  shall  refer  to  F(a,  Pj)  =  a  P(xj)  +  a  1  as  the  probability  transform  and  refer  to 
L(xj)  =  F(a,  P(xj))  as  the  transformed  probabilities.  We  see  that  in  the  face  of  conflict  the 
transformed  probabilities  move  toward  1 

We  further  observe  that  if  A(xj)  =  0,  then  P(x/A,  P)  =  0. 

Example:  Assume  X  =  {xj,  X2,  X3},  A  =  {^-,  and  P(x^)  =  0,  P(x2)  =  0.1  and  P(x3)  = 

0.9,  Here  we  get  P(xj)  =  0,  P(x2)  =  0.11  and  P(x3)  =  1  and  hence  Consist(Fl,  P)  =  0.1.  In  this 


case  the  transformed  probabilities  are: 

X(xi)  =  (0.9)  1=0.3 

Mx2)  =  (0.1)(0.1)  +  (0.9)I  =0.31 

L(x3)  =  (0. 1)(0.9)  +  (0.9)  1  =  0.39 
3 

In  this  case  A(x{)  L(x{)=  0.3  +  0.03 1  =  0.33 1  and  hence 
i  =  1 

P(X1/A,  P)  =  =  0.906,  P(x2/A,  P)  =  -^1  =  0.094  and  P(x3/A,  P)  =  =  0 


90 


We  must  consider  one  other  issue  here.  We  have  implicitly  assumed  that  the  possibility 

distribution  is  normal,  Maxj(xj)  =  1.  If  this  is  not  the  case  some  problems  can  arise.  Since 

P(x) 

Consist(A,  P)  =  Maxj[A(xj)  a — J-]  <  Maxj[A(xj)]  our  maximal  possible  consistency  goes 

down.  Here  the  problems  of  reduced  consistency  may  be  an  issue  related  to  the  internal  conflict 
of  the  possibility  distribution  rather  then  its  incompatibility  with  probability  distribution. 

It  may  be  interesting  to  consider  a  slight  modification  in  the  case  where  we  have 
Maxj[A(xj)]  =  a  <1.  Here,  instead  of  the  end  result  being  a  probabilistic  distribution  we  end  up 
with  a  Dempster-Shafer  belief  structure  m.  This  belief  structure  has  n  +  1  focal  elements  Bj  =  {xj} 
for  j  =  1  to  n  and  Bn  +  \  =  X.  Furthermore  for  j  =  1  to  n  we  have  m(Bj)  =  a*  P(xj/A,  P)  where  the 
P(xj/A,  P)  are  calculated  as  in  the  preceding,  For  Bn  +  j  =  X  we  have  m(X)  =  1  -  a*.  We  shall 

not  pursue  this  but  leave  it  as  a  suggestion. 


9.  Alternative  Measures  of  Certainty 

Here  we  consider  a  more  technical  issue  that  may  not  be  of  interest  to  all  readers.  We 
want  to  look  a  little  more  deeply  at  the  issue  of  deciding  whether  some  subset  B  of  X  contains  the 
value  V  given  that  we  know  that  V  is  A  which  is  the  basis  our  of  definition  of  the  measure 

Cert[V  is  B  /  V  is  A].  Here,  we  shall,  unless  otherwise  stated,  assume  A  is  normal. 

Our  definition  for  the  measure  for  Cert[V  is  B  /  V  is  A]  =  Minx[A(x)  v  B(x)]  is  an 

extremely  pessimistic  measure.  As  we  see  in  the  crisp  case  as  long  as  there  is  one  element  not  in 
B  that  is  possible,  in  A,  it  scores  a  value  of  zero.  We  see  that  if  A  =  { 1,  2,  3,  4,  5,  6,  7,  8,  9,  10} 
then  with  B  =  {1}  orB  =  {1,2,  ...,  9}  we  get  the  same  degree  of  certainty,  zero.  Here,  there  exists 
no  consideration  about  the  cardinalities  of  how  many  elements  in  A  are  not  in  B,  except  that  there 
exists  one 

We  observe  that  our  definition  of  certainty  is  the  degree  of  truth  of  the  proposition 
"All  elements  not  in  B  are  not  possible  given  A." 
which  we  expressed  as  Minx[A(x)  v  B(x)].  In  the  case  where  B  is  crisp  this  becomes 

Cert[V  is  B  /  V  is  A]  =  Minx  e  g[A(x)] 

Prade  and  Yager  [33]  suggested  a  softening  of  the  measure  of  certainty  with  the  concept  of 
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expectedness.  In  [33]  they  introduced  the  idea  of  expectedness  of  V  is  B  given  V  is  A,  denoted 
Exp[B/A]  which  they  defined  as  a  degree  of  truth  of  the  proposition 

"Most  of  the  elements  not  in  B  are  not  possible  given  A." 

In  the  case  where  B  is  crisp  we  can  express  this  as 

Exp[V  is  B  /  V  is  A]  =  Mostx  e  b[A(x)] 

We  note  that  the  difference  between  the  two  concepts,  certainty  and  expectedness,  is  the 
respective  uses  of  the  terms  all  and  most.  We  observe  that  these  two  tenns  are  examples  of  what 
Zadeh  called  linguistic  quantifiers  [34].  Here,  we  shall  suggest  a  parameterized  formulation 
which  leads  to  a  generalization  of  these  types  of  measures.  Let  Q  indicate  a  general  member  of 
the  class  of  regular  monotonic  linguistic  quantifiers  [35].  Using  this  we  introduce  the  idea  of 
what  we  shall  denote  as  Q-Cert[[V  is  B  /  V  is  A]  or  more  succinctly  Q-Cert[B/A].  Specifically 
we  define  Q-Cert[B  /  A]  as  the  truth  of  the  statement 

Q  of  the  elements  not  in  B  are  not  possible  given  A. 

First,  we  note  that  as  suggested  by  Zadeh  linguistic  quantifier  Q  can  be  expressed  as  a 
fuzzy  subset  Q  [0,  1]  — »  [0,  1]  where  Q(r)  indicates  the  degree  to  which  the  proportion  r  satisfies 
the  concept  Q.  The  fact  that  Q  is  a  regular  monotonic  linguistic  quantifier  requires  that  Q  satisfy 
the  additional  three  conditions:  Q(0)  =  0,  Q(l)  =  1  and  Q(x)  >  Q(y)  if  x  >y 

We  note  some  special  cases  of  Q.  The  first  is  Q*  where  Q*(l)  =  1  and  Q*(x)  =  0  for  all 

x*  1.  This  corresponds  to  the  linguistic  quantifier  all.  The  second  special  case  is  Q*  where 
Q  (0)  =  0  and  Q*(x)  =  1  for  all  x  ±  0.  This  corresponds  to  the  quantifier  any.  Another  special 
case  is  Qa  where  Qa(x)  =  x-  It  is  suggested  that  this  models  the  linguistic  quantifier  some. 
Furthermore  it  is  suggested  that  Qa  corresponds  to  the  quantifier  that  implicit  when  no  quantifier 
is  explicitly  expressed,  it  is  a  kind  of  default  quantifier. 

We  shall  formally  define  the  truth  of  the  proposition  Q  of  the  element  not  in  B  are  not 
possible  given  A,  Q-Cert[V  «B/V  is  A],  using  these  importance  weighted  OWA  operator  [36]. 
We  first  recall  this  operator. 

Let  (cj,  dj)  be  a  two  tuple  in  which  cj  is  called  the  importance  and  dj  is  called  the  argument 

value.  We  recall  that  the  OWA  aggregation  of  a  collection  of  these  tuples  guided  by  a  quantifier 

Q.  OWAq[(ci,  di),  (C2,  d2)...,  ((cn,  dn)],  is  defined  as 
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n 

OWAq[(ci,  di),  (C2,  d2)...,  ((cn,  dn)]  =  wj  dGCj) 

j  =  l 

T  T 

where  a(j)  is  the  index  of  the  j^1  largest  of  the  di  and  wj  =  Q(^)  -  Q(^=p 
n 

and  T  =  cj,the  sum  of  all  importances. 
i  =  1 


j 

where  Tj  =  cGCj) 
i  =  1 


In  the  following  we  shall  express  Q-Cert[V  is  B  /  V  is  A]  using  this  operation.  For 
notational  convenience  we  assume  the  domain  of  V,  X  =  {xi,...,  xn}  and  B(xj)  =  bj  and  A(xj)  = 
ap  Using  this  notation  Q-Cert[V  is  B  /  V  is  A]  =  OWAQ[(bi,  aj)]  where  bj  =  1  -  bj  and  aj  =  1  -  aj 


thus 


Q-Cert[V  is  B  /  V  is  A]  =  wj  aG(j) 

j  =  l 


1  ’  T’ 

where  a(j)  is  the  index  of  the  j^1  largest  of  the  aj  and  w;  =  Q(  — r-)  -  Q(  J '  )  where  Tj 

•J  I  J[  ** 


i  =  ? 


5o0 


and  T  =  b^. 

k  =  ? 


Let  us  consider  the  environment  when  B  is  crisp.  Here  bj  =  1  if  xj  e  B  and  bj  =  0  if  xi 

g  B.  Thus  bf=  0  if  xj  e  B  and  bj  =  1  if  xi  g  B.  In  this  situation,  we  also  observe  if  ba(j)  e  B  then 

j 

ba(j)  =  0  and  since  Tj  =  ba(j)  then  in  this  case,  ba(j)  e  B,  Tj  =  Tj  .  \  and  hence  Wj  =  0. 

i  =  1 


Thus  we  see  that  all  terms  that  are  in  B  have  OWA  weights  equal  to  zero.  Furthermore,  for  those 
elements  not  in  B  we  have  ba(j)  =  1  and  Tj  is  the  number  of  elements  up  to  including  the  jth 
largest  aj  that  are  not  in  B.  Effectively  for  those  xa(j  j  g  B  we  have  Tj  =  1  +  Tj  .  \  and  for  those 
xa(j)  e  B,  we  have  Tj  =  Tj  .  \ 


Thus  in  this  case  where  B  is  crisp  situation  we  can  just  consider  those  elements  not  lying 

in  B.  We  shall  let  n  =  |B|  and  let  a-(j)  be  the  index  of  the  element  having  the  j^1  largest  value  for 

B 


aj  of  those  lying  in  B.  Then,  in  this  case,  B  is  crisp,  we  have  Q-Cert[B/A]  =  w;  a, 

i-  l 


'J  \(i) 


where  wj  =  Q(=)  -  Q(J-^-). 


j-1. 


yJ 


B| 


IB  I 


Let  us  consider  the  resulting  formulations  for  some  different  examples  of  Q.  If  Q  is  Q* 
then  w-  =  1  and  Wj  =  0  for  all  other  j  and  Q*-Cert[B  /  A]  =  MinXj  g  B[A(xj)].  This  was  our 
original  definition  of  Cert[B  /  A].  If  we  select  Q  =  Q  ,  then  we  get  Q*Cert[B  /  A]  =  Maxx. 


g  g[A(xj)].  This  is  what  Dubois  and  Prade  called  the  un-guaranteed  necessity.  Another  special 
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case  is  where  Q(x)  =  x.  In  this  case  we  have  w;  =  =  and  Q-Cert[V  is  B  /  Y  is  A]  =  =  A(x;),  it 

•*  n  _  n  •* 

]  £  B 

is  the  average  of  A(xj)  for  those  xj  not  in  B. 

In  the  following,  we  shall  look  a  useful  family  of  Q-Cert[V  is  B  /  V  is  A]  based  on  a  class 
of  quantifiers  parameterized  by  a  single  scalar  value  X  Consider  the  function  shown  in  figure 

9. 


Figure  9.  Quantifiers  parameterized  by  p 

We  shall  denote  X  as  the  strength  of  necessity.  We  easily  see  that  when  X  =  1  we  get  the 
strongest  measure  Q^-Cert[B  /  A]  =  MinXj  g  B[A(xj)].  When  X  =  0  then  we  get  Q^-Cert[B  /  A] 
=  =  A(xj).  Generally  we  observe  at  as  X  moves  from  0  to  1,  the  value  of  Q^-Cert[B  /  A] 

Vb 

decreases.  If  we  impose  the  additional  assumption  that  A  is  also  crisp  we  add  in  developing  a 

deeper  intuitive  understanding  of  the  class  of  formulations  for  uncertainty  we  have  introduced.  In 

the  case  when  X  =  1  to  be  certain  of  the  truth  of  statement  V  is  B  we  require  if  an  outcome  is 

possible,  in  A,  it  is  also  in  B.  In  the  case  where  1  =  0  we  have 

QrCert[B  /  A]  =  =  A(xj)  =  1  -  Jr  A(xj). 

nj  £  B  nj  g  B 

Here  we  take  average  possibility  of  the  elements  not  in  B  and  subtract  that  from  one. 

In  this  section  we  have  described  a  family  of  definitions  for  the  idea  of  the  Certainty  of  V 
is  B  given  V  is  A  based  on  the  parameter  Q  which  we  denoted  Q-Cert[B  /  A]  By  appropriately 
choosing  the  quantifier  Q  we  can  model  the  formulation  we  want  to  use  for  our  concept  of 
certainty.  One  important  way  in  which  these  definitions  for  certainty  differ  is  with  respect  to  their 
strictness.  Recalling  that  Q-Cert[B  /  A]  is  defined  as  the  truth  of  the  statement  "Q  of  the  element 
not  in  B  are  not  possible  given  A"  we  that  the  the  larger  Q  the  stricter.  In  order  to  more  formally 
capture  this  idea  of  strictness  we  can  associate  with  any  quantifier  Q  a  value  called  its  attitudinal 
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character  defined  as 


A-C(Q)=  Q(y)  dy 

Jo 

It  can  be  shown  that  A-C(Q)  e  [0,  1],  Also  we  can  show  that  for  Q  =  Q*  we  get  A-C(Q*)  =  0,  for 

Q  =  Q*  we  get  A-C(Q*)  =  1  and  for  Q(x)  =  x  we  get  A-C(Q)  =  0.5.  Thus  the  smaller  the  value  of 
A-C  the  stricter  our  concept  of  certainty. 
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